Reputation: 777

Data structure + algorithm for ipv4 storage - efficient searching in prefixes

I am searching for Data Structure for IPV4. What should it store? Prefix: (base + mask) --> for example 85.23.0.0/16

base = 85.23.0.0 -> 32 bit unsigned

mask = 16 AKA 255.255.0.0 -> char 8 bit unsigned

So min host is 85.23.0.0 and max host is 85.23.255.255 (I know that it should be .0.1 and .255.254 in normal case but I want to simplify it)

The main thing that I require is speed of searching IP in stored prefixes. For example I give unsigned int (32bit) and I need to tell whether it is there or no.

I am writing in C++ so I can use STL

Now it is stored in STL set (pair base + mask) and I am searching one by one, so it is sort of O(n) (Excluding that is it probably BST tree, so walking through the it might be slower than O(n))

To sum up: I don't need efficient way to store IPV4, I need an efficient way to SEARCH it in some data structure. And the data structure won't store port or family type etc. It will store PREFIX (base + mask).

And the I am searching for data structure + some algorithm of searching.

Upvotes: 5

Answers (4)

Logman

Reputation: 4189

This method will only work for masks range 1-31. To make fast binary search for full range (0-32) you should probably store mask and address inside 64bit integer (for ex. like this std::vector<unsigned long long> networkAddrs( bases.size() ); ... networkAddrs[i] = (bases[i] << 32) + masks[i];)

Using simple property of network address that you can store mask inside of unused bits of network address you can simple store mask and base inside single 32bit integer, sort vector of them and do binary search. Something like this:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm> 
#include <cmath>

unsigned int insertMask(unsigned int ip, unsigned int mask)
{
    unsigned int longMask = (~0) << (32-mask);
    unsigned int netAdd = (ip & longMask) + (1 << (31-mask));
    return netAdd;
}

bool isInNetwork(std::vector<unsigned int>& nAs, unsigned int ip, unsigned int mask)
{
  unsigned int netAdd = insertMask(ip, mask);
  auto pos = std::lower_bound(nAs.begin(), nAs.end(), netAdd); 
  return pos != nAs.end() && *pos == netAdd;
}

int main()
{

  std::vector<unsigned int> bases {
       (192u<<24)+(168<<16)+(0<<8)+(0)
      ,190u<<24
      ,191u<<24
      ,192u<<24
      ,193u<<24
      ,194u<<24
      ,195u<<24
      ,196u<<24
  };

  std::vector<unsigned int> masks {24,24,24,24,24,16,8,4};

  std::vector<unsigned int> networkAddrs( bases.size() );

  for(int i=0; i<bases.size(); i++)
  {
      networkAddrs[i] = insertMask(bases[i], masks[i]);
  }

  std::sort (networkAddrs.begin(), networkAddrs.end());

  unsigned int ip_addr = (192u<<24)+(168<<16)+(0<<8)+(17);
  unsigned int mask = 24;

  if(isInNetwork(networkAddrs, ip_addr, mask))
    std::cout << "TRUE";
  else
    std::cout << "FALSE";

  std::cout << '\n';

}

EDIT: I changed method to code mask as like this:

for ip XXX.XXX.YYY.YYY/16
0b XXXXXXXX XXXXXXXX 10000000 00000000

for ip (XXX.XXX.0xXY.YYY/12)
0b XXXXXXXX XXXXXXXX XXXX1000 00000000

Upvotes: 1

TilmannZ

Reputation: 1889

You can use a CritBit tree. They are quite simple but there are also many open-source variants out there. It is a searchable tree that uses prefix sharing, so that all values with the same prefix are stored in the same sub-tree. Depending on the implementation, the prefix-sharing can also be used the reduce space requirements, because you only store the prefix once, not for every entry.

Upvotes: 0

osgx

Reputation: 94255

You may check the DXR paper 'Towards a Billion Routing Lookups per Second in Software' from 2012:

http://info.iet.unipi.it/~luigi/papers/20120601-dxr.pdf

This paper presents DXR, a lookup scheme based on transforming large routing tables into compact lookup structures which easily fit into cache hierarchies of modern CPUs.

Our transform distills a real-world BGP snapshot with 417,000 IPv4 prefixes and 213 distinct next hops into a structure consuming only 782 Kbytes, less than 2 bytes per prefix. Experiments show that the corresponding lookup algorithm scales linearly with the number of CPU cores: running on a commodity 8-core CPU it yields average throughput of 840 million lookups per second for uniformly random IPv4 keys.

While you are asking "don't need efficient way to store IPV4", compact storage will help to performance of lookup, because memory is very slow and CPU cache is faster (variant of memory wall of computer memory hierarchy). More compact representation have less misses from L3 cache into memory and may be faster.

DXR has very good speed (on 3.6 GHz AMD FX-8150 with DDR3 1866MHz memory):

With random search keys (a worst-case situation) D18R goes from 117 million lookups per second (Mlps) on a single core, to 840 Mlps with all 8 cores active. ... Depending on the configuration and query pattern, DXR achieves between 50 and 250 million lookups per second on 1 core

117 mlps on 3.6 GHz is around 30 cpu ticks per lookup; and memory random access latency for DDR3 1866MHz is .. more than 9-10 ns or 32-36 cpu cycles only to get first bytes from memory to memory controller when the DRAM row is open (usually it may be not - opening row is slow too). Additional time is needed to read out full cache line and some more time to forward this cache line to L3, L2, L1, registers. Full memory latency may be close to 100 ns (360 cpu cycles)

Upvotes: 3

Robbykk

Reputation: 66

Why not use std:unordered_map? Should be between O(1) and O(n), or you can use std:map if you want a fixed performance of O(log n) (if you REALLY are interested in performance only in the searching phase). Unless your datasets are large you might find there is no significant difference.

Upvotes: 2

Data structure + algorithm for ipv4 storage - efficient searching in prefixes

Answers (4)

Related Questions