Reputation: 691
How to replace X with _, given the following dataframe:
data = {'street':['13XX First St', '2XXX First St', '47X Second Ave'],
'city':['Ashland', 'Springfield', 'Ashland']}
df = pd.DataFrame(data)
The streets need to be edited, replacing each X with an underscore _.
Notice that the number of Integers changes, as does the number of Xs. Also, street names such as Xerxes should not be edited to _er_es, but rather left unedited. Only the street number section should change.
data = {'street':['13__ First St', '2___ First St', '47_ Second Ave'],
'city':['Ashland', 'Springfield', 'Ashland']}
df = pd.DataFrame(data)
Some potential regex building blocks include:
1. [0-9]+ to capture numbers
2. X+ to capture Xs
3. ([0-9]+)(X+) to capture groups
df['street']replace("[0-9]+)(X+)", value=r"\2", regex=True, inplace=False)
I'm pretty weak with regex, so my approach may not be the best. Preemptive thank you for any guidance or solutions!
Upvotes: 2
Views: 554
Reputation: 23099
IIUC, we can pass a function into the repl
argument much like re.sub
def repl(m):
return '_' * len(m.group())
df['street'].str.replace(r'([X])+',repl)
out:
0 13__ First St
1 2___ First St
2 47_ Second Ave
Name: street, dtype: object
if you need to match only after numbers, we can add a '\d{1}'
which will only match after a single instance of X
df['street'].str.replace(r'\d{1}([X]+)+',repl)
Upvotes: 2
Reputation: 150745
IIUC, this would do:
def repl(m):
return m.group(1) + '_'*len(m.group(2))
df['street'].str.replace("^([0-9]+)(X*)", repl)
Output:
0 13__ First St
1 2___ First St
2 47_ Second Ave
Name: street, dtype: object
Upvotes: 3
Reputation: 134
Assuming 'X' only occurs in the 'street' column
streetresult=re.sub('X','_',str(df['street']))
Your desired output should be the result
Code I tested
import pandas as pd
import re
data = {'street':['13XX First St', '2XXX First St', '47X Second Ave'],
'city':['Ashland', 'Springfield', 'Ashland']}
df = pd.DataFrame(data)
for i in data:
streetresult=re.sub('X','_',str(df['street']))
print(streetresult)
Upvotes: 0