Tony Ng
Tony Ng

Reputation: 164

Split Pandas Series to Multiple Column by Substring

I am trying to split a DataFrame Series with likely regex into multiple columns.

Replicable code:

pd.DataFrame({"Animals":["(Cat1, Dog1)", "(Cat1, Dog2)", "(Cat1, Dog3)", "(Cat2, Dog1)", "(Cat2, Dog2)", "(Cat2, Dog3)"]})

Input Table:

enter image description here

Desired Table:

enter image description here

Thanks in advance!

Upvotes: 3

Views: 275

Answers (3)

Pygirl
Pygirl

Reputation: 13349

Try:

df[['Animal1', 'Animal2']] = df['Animals'].str[1:-1].str.split(', ', expand=True)

    Animals         Animal1 Animal2
0   (Cat1, Dog1)    Cat1    Dog1
1   (Cat1, Dog2)    Cat1    Dog2
2   (Cat1, Dog3)    Cat1    Dog3
3   (Cat2, Dog1)    Cat2    Dog1
4   (Cat2, Dog2)    Cat2    Dog2
5   (Cat2, Dog3)    Cat2    Dog3

Upvotes: 3

David Erickson
David Erickson

Reputation: 16683

EDIT:

Per comments, Shubham's solution is the cleanest:

df[['Animals1', 'Animals2']] = df['Animals'].str.extract(r'(\w+), (\w+)')

You can also use replace to get rid of parenthses and spaces and then split(',') with expand=True to create new columns:

df[['Animal1', 'Animal2']] = (df['Animals'].replace(['\(', '\)', '\s+'], '', regex=True)
                              .str.split(',', expand=True))
df
Out[1]: 
        Animals  Animal1  Animal2
0  (Cat1, Dog1)     Cat1     Dog1
1  (Cat1, Dog2)     Cat1     Dog2
2  (Cat1, Dog3)     Cat1     Dog3
3  (Cat2, Dog1)     Cat2     Dog1
4  (Cat2, Dog2)     Cat2     Dog2
5  (Cat2, Dog3)     Cat2     Dog3

Upvotes: 3

techytushar
techytushar

Reputation: 803

One way would be:

df = pd.DataFrame({"Animals":["(Cat1, Dog1)", "(Cat1, Dog2)", "(Cat1, Dog3)", "(Cat2, Dog1)", "(Cat2, Dog2)", "(Cat2, Dog3)"]})
df['Animal1'] = df['Animals'].map(lambda x: x.split(', ')[0][1:])
df['Animal2'] = df['Animals'].map(lambda x: x.split(', ')[1][:-1])

Upvotes: 2

Related Questions