Using hashing for efficient Deep Packet Inspection

In order to enhance the performance of Deep Packet Inspection we are preprocessing the set of rules by performing a hashing algorithm on them which in turn divides the rules into smaller chunks of sub-rules, making the inspection much faster.

The hashing is done on the first 17 bits of the originally 104. After the preprocessing is done, whenever a packet arrives, we hash its first 17 bits and check it against the much smaller set of rules based on the result.

(The algorithm is used twice, after hashing the first 17 it hashes the next 16 bits and stores the results as well, but for this specific problem we can assume that we're only performing a simple hash on a fixed number of bits)

The algorithm is indeed efficient, however, we can't seem to find a way to apply it on entries with don't care bits - which we get a lot.

We searched for a solution in numerous places, and tried for instance a suggestion of duplicating rules with don't care bits. It didn't work however, for the vast amount of memory it would take (for each don't care bit of the 17 of the numerous rules there is an option of it being either 1 or zero - this would demand an exponential amount of space).

We would very much appreciate any suggestion or insight, even a partial solution would be great.

Note: There is no limit on preprocessing time or additional space as long as it is not exponential or anything impractical.

Upvotes: 3

Answers (2)

mcdowella

Reputation: 19621

If you use the hash table as a cache and revert to something slower if an entry for the current value isn't found then you don't need to populate it completely. You could either build it ahead of time based on an analysis of previous traffic, creating as many entries as you can afford, or you could populate it dynamically, creating new entries after you process a packet when an entry was not found, and removing old entries that had not been used for some time to reclaim store.

Upvotes: 2

ruakh

Reputation: 183564

2¹⁷ is 131,072: large, but not inordinately so. If you use a bit of indirection (for example, storing your rules in an array without duplication, and then building a size-2¹⁷ table of indices into that array), then you should be able to do this in well under 1 MB.

Upvotes: 0

Using hashing for efficient Deep Packet Inspection

Answers (2)

Related Questions