Reputation: 164
I am trying to split a DataFrame Series with likely regex into multiple columns.
Replicable code:
pd.DataFrame({"Animals":["(Cat1, Dog1)", "(Cat1, Dog2)", "(Cat1, Dog3)", "(Cat2, Dog1)", "(Cat2, Dog2)", "(Cat2, Dog3)"]})
Input Table:
Desired Table:
Thanks in advance!
Upvotes: 3
Views: 275
Reputation: 13349
Try:
df[['Animal1', 'Animal2']] = df['Animals'].str[1:-1].str.split(', ', expand=True)
Animals Animal1 Animal2
0 (Cat1, Dog1) Cat1 Dog1
1 (Cat1, Dog2) Cat1 Dog2
2 (Cat1, Dog3) Cat1 Dog3
3 (Cat2, Dog1) Cat2 Dog1
4 (Cat2, Dog2) Cat2 Dog2
5 (Cat2, Dog3) Cat2 Dog3
Upvotes: 3
Reputation: 16683
EDIT:
Per comments, Shubham's solution is the cleanest:
df[['Animals1', 'Animals2']] = df['Animals'].str.extract(r'(\w+), (\w+)')
You can also use replace
to get rid of parenthses and spaces and then split(',')
with expand=True
to create new columns:
df[['Animal1', 'Animal2']] = (df['Animals'].replace(['\(', '\)', '\s+'], '', regex=True)
.str.split(',', expand=True))
df
Out[1]:
Animals Animal1 Animal2
0 (Cat1, Dog1) Cat1 Dog1
1 (Cat1, Dog2) Cat1 Dog2
2 (Cat1, Dog3) Cat1 Dog3
3 (Cat2, Dog1) Cat2 Dog1
4 (Cat2, Dog2) Cat2 Dog2
5 (Cat2, Dog3) Cat2 Dog3
Upvotes: 3
Reputation: 803
One way would be:
df = pd.DataFrame({"Animals":["(Cat1, Dog1)", "(Cat1, Dog2)", "(Cat1, Dog3)", "(Cat2, Dog1)", "(Cat2, Dog2)", "(Cat2, Dog3)"]})
df['Animal1'] = df['Animals'].map(lambda x: x.split(', ')[0][1:])
df['Animal2'] = df['Animals'].map(lambda x: x.split(', ')[1][:-1])
Upvotes: 2