Sebastian Naitsabes
Sebastian Naitsabes

Reputation: 27

Substr replace based on regex expression using pandas

I am trying to modify data frame values and mask IP addresses using regex.

I have a list of IP addresses and I am trying to mask them in the data frame:

This is what I have:

123.123.123.123 and I am expecting to get 12X.XXX.XXX.X23

23.123.123.123 and I am expecting to get 23.XXX.XXX.X23

So I am always leaving 2 first and 2 last elements of IP, the rest of IP I am trying to hide.

Upvotes: 2

Views: 47

Answers (2)

Wouter
Wouter

Reputation: 3261

You can use regular expressions to replace anything but a dot for X, except for the first two and last two characters.

import pandas as pd
import re

df = pd.DataFrame({'ip': ['123.123.123.123', '23.123.123.123']})
df['ip_masked'] = [re.sub('(?<!^)(?<!^.)[^\.](?=.{2,}$)', r'X', x) for x in df.ip]
print(df)

                ip        ip_masked
0  123.123.123.123  12X.XXX.XXX.X23
1   23.123.123.123   23.XXX.XXX.X23

Upvotes: 2

David
David

Reputation: 891

this should help

df['ip_masked']=df.ip.str[:2]+df.ip.apply(lambda x: re.sub('\d','X',x)[2:-2])+df.ip.str[-2:]

Upvotes: 1

Related Questions