Reputation: 33
I have two dataframes, one containing IP addresses (df_ip
), one containing IP networks (df_network
).
The IP's and Networks are of the type ipaddress.ip_address
and ipaddress.ip_network
, which enables checking if an IP lies in the Network (ip in network
).
The dataframes look as follows:
df_ip:
IP
0 10.10.10.10
1 10.10.20.10
2 10.10.20.20
df_network:
NETWORK NETWORK_NAME
0 10.10.10.0/28 Subnet1
1 10.10.20.0/27 Subnet2
I want to merge/join df_ip
with df_network
, adding the name of the network in which the IP lies per row.
For this small instance, it should return the following:
df_merged:
IP NETWORK_NAME
0 10.10.10.10 Subnet1
1 10.10.20.10 Subnet2
2 10.10.20.20 Subnet2
My actual dataframes are much larger, so id prefer to not use for-loops to maintain efficiency.
How can I best achieve this? If this requires changing the datatypes, that's okay.
Note: I've added code below to create the data for convenience.
import pandas as pd
import ipaddress
# Create small IP DataFrame
values_ip = [ipaddress.ip_address('10.10.10.10'),
ipaddress.ip_address('10.10.20.10'),
ipaddress.ip_address('10.10.20.20')]
df_ip = pd.DataFrame()
df_ip['IP'] = values_ip
# Create small Network DataFrame
values_network = [ipaddress.ip_network('10.10.10.0/28'),
ipaddress.ip_network('10.10.20.0/27')]
names_network = ['Subnet1',
'Subnet2']
df_network = pd.DataFrame()
df_network['NETWORK'] = values_network
df_network['NETWORK_NAME'] = names_network
Upvotes: 2
Views: 565
Reputation: 561
an efficient way to avoid any loops is to use numpy arrays to check where ip & netmask == network_address
, which is how to check whether an ip lies within the network.
note that this returns only the first matching network name
import numpy as np
net_masks = df_network.NETWORK.apply(lambda x: int(x.netmask)).to_numpy()
network_addresses = df_network.NETWORK.apply(lambda x: int(x.network_address)).to_numpy()
def get_first_network(ip):
is_in_network = int(ip) & net_masks == network_addresses
indices = np.argwhere(is_in_network)
if indices.size>0:
return df_network.loc[int(indices[0]), 'NETWORK_NAME' ]
else:
None
df_ip['network_name'] = df_ip.IP.apply(get_first_network)
which results in:
IP network_name
0 10.10.10.10 Subnet1
1 10.10.20.10 Subnet2
2 10.10.20.20 Subnet2
Upvotes: 3