The third and last drawback is that some TCP options are only sent during SYN. And some of these options are particularly important in terms of performance. For instance, the MSS Maximum Segment Size is equivalent to the MTU at TCP level. It avoids the fragmentation of the IP packet. This option is so critical that it is encoded in the signature of the SYN cookie so that it can be recovered during ACK. The SACK or "Selective ACK" is used to indicate exactly what has been received (or not) in order to limit the amount of data that will have to be retransmitted in case of losses. The WScale or "Window scale" allows you to switch the maximum size of the TCP window from 64 KB to 1 GB. This is especially important for connections that have an elevated latency x bandwidth product. The ECN or "Explicit Congestion Notification" allows the server to notify the sender proactively that congestion has occurred on the network, thus allowing the sender to adapt its flow rate before any loss occurs. This is not an actual option but simply a part of the TCP header. During SYN, it indicates that the sender is able to notify congestion. Otherwise, it simply indicates congestion. This parameter is usually not supported but can be important, if necessary.
These options are only sent in the SYN (MSS, WScale, SACK Permitted). They are normally stored in the connections queue and are thus lost. Ideally the client could resend them. Since
Linux 2.6.26, these options are encoded in the least significant bits of the timestamp, somewhat in the manner of steganography. The timestamp is a little-known extension of TCP, yet it is widely used by operating systems. It allows them, among other things, to accurately measure the RTO, meaning the time between sending a packet and receiving a response, just like an ICMP ping. This option works with two fields: the timestamp of the issuer and a copy of the received timestamp. In practice, if the server encodes the options in its own timestamp, the client will return them to him in the ACK. The server can then decode and recover them. This remedies one of the major limitations of SYN cookies. And it does not require specific support on the client side.
Because of these limitations, in Linux, the SYN cookie mechanism will only be activated during a queue overflow if the net.ipv4.tcp_syncookies parameter is set to 1.
More anecdotally, the trick of encoding the main options in the timestamp was integrated into Linux even though the SYN cookie was considered potentially obsolete. One reason behind this was the loss of these options. It was also suggested that computers had become powerful enough to handle such attacks without requiring a particular defense mechanism. But several developers stepped up and refuted this suggestion, offering benchmarks as proof. One of these developers is Willy Tarreau, author of HAProxy which the new IP Load Balancing offered by OVH is based on If you want more information, I encourage you to read the following excellent article: https://lwn.net/Articles/277146/.
As demonstrated by Willy Tarreau, the problem is not solved without SYN cookie. And even with SYN cookies, the computational load increased from 60% to 70% in the tests he ran. Indeed, although it is no longer possible to get the connection queue to overflow, it is still possible to overload the CPU and render the machine completely unavailable by overloading it with SYN packets, because of the processing time required for each packet.
In Linux, the "SYNPROXY" iptables target has optimised the management of SYN cookies since version 3.12 by managing them much earlier in the packets treatment process within the TCP Stack. This takes the load of the CPU and allows it to better bear the burden. This implementation uses a slightly different strategy. Instead of rebuilding the connection during ACK, it regenerates the SYN packet that should have been received by the TCP Stack and ensures that the sequence number is consistent. This approach is possible because the iptables target is located in the same Linux kernel and therefore has access to the same primitives.
When we designed the infrastructure of the new IP Load Balancing offer, we paid particular attention to the various types of TCP/IP attacks, including attacks like SYN Flood, that unfortunately are all too common.
On a machine equipped with a 24-core Intel(R) Xeon(R) CPU E5-2650L v3 @ 1.80GHz, without HyperThreading, 256GB of RAM and a Mellanox ConnectX-4 100Gbit/s card, synchronised at 40Gbit/s (for better stability), we started to lose connections at 4M PPS (2 Gbps) and the machine became unreachable in SSH from 7M PPS. These tests were performed on the iptables SYNPROXY target.
7M PPS before losing a machine may seem a high value. But it is, in fact, entirely insufficient. We have already suffered and protected our customers from such attacks, spewing several tens of millions of packets per second. While the technique is a good one per se, a single machine cannot withstand such a sustained attack.