How to split one column into multiple columns in Pandas using regular expression?

Question

For example, if I have a home address like this:

71 Pilgrim Avenue, Chevy Chase, MD

in a column named 'address'. I would like to split it into columns 'street', 'city', 'state', respectively.

What is the best way to achieve this using Pandas ?

I have tried df[['street', 'city', 'state']] = df['address'].findall(r"myregex").

But the error I got is Must have equal len keys and value when setting with an iterable.

Thank you for your help :)

jezrael · Accepted Answer

You can use split by regex ,\s+ (, and one or more whitespaces):

#borrowing sample from `Allen`
df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
print (df)
                              address id             street          city  \
0  71 Pilgrim Avenue, Chevy Chase, MD  a  71 Pilgrim Avenue   Chevy Chase   
1         72 Main St, Chevy Chase, MD  b         72 Main St   Chevy Chase   

  state  
0    MD  
1    MD

And if need remove column address add drop:

df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
df = df.drop('address', axis=1)
print (df)
  id             street         city state
0  a  71 Pilgrim Avenue  Chevy Chase    MD
1  b         72 Main St  Chevy Chase    MD

How to split one column into multiple columns in Pandas using regular expression?

Answers (2)

Related Questions