Reputation: 27
I am trying to modify data frame values and mask IP addresses using regex.
I have a list of IP addresses and I am trying to mask them in the data frame:
This is what I have:
123.123.123.123
and I am expecting to get 12X.XXX.XXX.X23
23.123.123.123
and I am expecting to get 23.XXX.XXX.X23
So I am always leaving 2 first and 2 last elements of IP, the rest of IP I am trying to hide.
Upvotes: 2
Views: 47
Reputation: 3261
You can use regular expressions to replace anything but a dot for X, except for the first two and last two characters.
import pandas as pd
import re
df = pd.DataFrame({'ip': ['123.123.123.123', '23.123.123.123']})
df['ip_masked'] = [re.sub('(?<!^)(?<!^.)[^\.](?=.{2,}$)', r'X', x) for x in df.ip]
print(df)
ip ip_masked
0 123.123.123.123 12X.XXX.XXX.X23
1 23.123.123.123 23.XXX.XXX.X23
Upvotes: 2
Reputation: 891
this should help
df['ip_masked']=df.ip.str[:2]+df.ip.apply(lambda x: re.sub('\d','X',x)[2:-2])+df.ip.str[-2:]
Upvotes: 1