Achaius
Achaius

Reputation: 6124

How to filter dataframe based on ip range

I have dataframe which has 2 columns. I want to filter this dataframe based on ip ranges present in json file.

ip_ranges.json

[
    {"start": "45.43.144.0", "end": "45.43.161.255"}
    {"start": "104.222.130.0", "end": "104.222.191.255"}
    ...
]

Dataframe:

ip,p_value
97.98.173.96,3.7
73.83.192.21,6.9
...

Note: ip_range.json contains 100k elements and my dataframe has 300k rows.

Currently, I implemented like this

This process works fine for small set of ip_ranges. But for large set of ip_ranges, the process takes longer time to complete.

Is there any better approach to perform this more efficiently?

Upvotes: 0

Views: 717

Answers (1)

JoergVanAken
JoergVanAken

Reputation: 1286

Just an idea: Put you ranges into a dataframe ip_range with columns From and To. Convert all ip-addresses (the ones in df, too) to decimal numbers with the fast code provided for example here.

Now generating the ranges can be done fast:

ip_range['Rng'] = ip_range.apply(lambda x: np.arange(x.From, x.To+1), axis=1)

These ranges can be converted into a DataFrame:

ips = pd.DataFrame(itertools.chain(*ip_range['Rng']))

This DataFrame can easily be merged with df.

Upvotes: 1

Related Questions