MinneapolisCoder9
MinneapolisCoder9

Reputation: 691

Pandas regex, replace group with char

Problem

How to replace X with _, given the following dataframe:

data = {'street':['13XX First St', '2XXX First St', '47X Second Ave'], 
        'city':['Ashland', 'Springfield', 'Ashland']} 
df = pd.DataFrame(data) 

The streets need to be edited, replacing each X with an underscore _.

Notice that the number of Integers changes, as does the number of Xs. Also, street names such as Xerxes should not be edited to _er_es, but rather left unedited. Only the street number section should change.

Desired Output

data = {'street':['13__ First St', '2___ First St', '47_ Second Ave'], 
        'city':['Ashland', 'Springfield', 'Ashland']} 
df = pd.DataFrame(data) 

Progress

Some potential regex building blocks include:
1. [0-9]+ to capture numbers
2. X+ to capture Xs
3. ([0-9]+)(X+) to capture groups

df['street']replace("[0-9]+)(X+)", value=r"\2", regex=True, inplace=False)

I'm pretty weak with regex, so my approach may not be the best. Preemptive thank you for any guidance or solutions!

Upvotes: 2

Views: 554

Answers (3)

Umar.H
Umar.H

Reputation: 23099

IIUC, we can pass a function into the repl argument much like re.sub

def repl(m):
    return '_' * len(m.group())

df['street'].str.replace(r'([X])+',repl)

out:

0     13__ First St
1     2___ First St
2    47_ Second Ave
Name: street, dtype: object

if you need to match only after numbers, we can add a '\d{1}' which will only match after a single instance of X

df['street'].str.replace(r'\d{1}([X]+)+',repl)

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150745

IIUC, this would do:

def repl(m):
    return m.group(1) + '_'*len(m.group(2))

df['street'].str.replace("^([0-9]+)(X*)", repl)

Output:

0     13__ First St
1     2___ First St
2    47_ Second Ave
Name: street, dtype: object

Upvotes: 3

SublimizeD
SublimizeD

Reputation: 134

Assuming 'X' only occurs in the 'street' column

streetresult=re.sub('X','_',str(df['street']))

Your desired output should be the result

Code I tested

import pandas as pd
import re

data = {'street':['13XX First St', '2XXX First St', '47X Second Ave'], 
        'city':['Ashland', 'Springfield', 'Ashland']} 
df = pd.DataFrame(data) 
for  i in data:
    streetresult=re.sub('X','_',str(df['street']))
print(streetresult)

Upvotes: 0

Related Questions