How to calculate a hash value from different fields of an IP packet

Question

I need to implement a hash table to maintain IP packets. However, due to the uniqueness of packets, I cannot make a hash key using one single element (say IP address). Following are the elements in a packet which will be responsible for making an packet unique:

Source IP address (16 byte string, due to IPv6 format)
Source port (2 byte)
Destination ip address(16 byte again)
Destination port(2 byte) 5. id1(1 byte)

I know that if there is one element to calculate hash value, it can be done using any of the known algorithm like MD5, etc. My question is, how can I include multiple elements like the above, in the process of hash value calculation?

Brendan · Accepted Answer

To create an effective hash; first determine which data you're going to use for lookups. For example, if you're going to search for all packets sent from a certain IP address then you only want to use "source IP address" (and you wouldn't want to use source IP address and source port because that'd mean you'd have to do 65536 searches to find all packets sent from a certain IP address).

The next step is to determine the most effective hash size. This tends to depend on the amount of data and the size of the CPU's caches. If the hash size is too small (e.g. 8 bits) then you end up with long lists of entries for each hash (which increases time to find anything); and if the hash size is too big (e.g. 24 bits) then you get frequent cache misses when trying to find the list of entries for the hash.

Please note that you can also have multiple levels. For example, if you want to search for the packets from a specific port and IP address; then you could use the IP address alone to create one hash table that is used to find a second hash table; and then use the port to create a different hash that is used with the second hash table.

Once you've decided what information you need to use for the hash and the hash size; the next step is to determine how to calculate the hash in a way that minimises collisions. This calculation needs to be fast - you don't want a large amount of overhead that attempts to prevent a small amount of overhead (and using something complex like MD5 would be a bad idea). Often simple methods like "XOR and shift" are fast and effective. For example, for a 16-byte IP address and 16-bit hash you might just do hash ^= (hash << 3) | next_pair_of_bytes; 8 times.

Finally, you want to tune it. Mostly you want to adjust the hash size and try a few different hash calculations to see if it improves performance. All of the above relies on assumptions about the data and cache sizes, and these assumptions may be wrong in practice. For example, maybe most packets come from a single IP address and using the IP address in the hash is a waste of time; maybe other parts of the program are consuming lots of cache and attempting to minimise cache misses was a bad idea (and a much larger hash might improve performance); maybe there isn't as much data as you thought and you're not getting many hash collisions and reducing the hash size can improve performance; etc.

How to calculate a hash value from different fields of an IP packet

Answers (2)

Related Questions