Brian Guan
Brian Guan

Reputation: 323

splitting pandas df column into multiple columns

I have a df like this:

id  | authors
1   | smith, john; cameron, james;
2   | guan, brian;
3   | obs, noah; mumm, erik; lee, matt;

and want it to split into:

id  | author1     | author 2      | author 3
1   | smith, john | cameron, james|
2   | guan, brian |               |
3   | obs, noah   | mumm, erik    | lee, matt

I know pd.split() will split in half based on a delimiter, but it's tricky because some columns will have 1 author, some 2, and some more.

Upvotes: 1

Views: 42

Answers (2)

Mehdi Golzadeh
Mehdi Golzadeh

Reputation: 2583

Use str.split and concat function:

df = pd.concat([df[['id']],df['authors'].str[0:-1].str.split('; ',expand=True)],axis=1)
df.columns = ['id','author1','author2','author3']

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

It looks like you can use str.split with expand option:

df[['id']].join(df.authors.str.strip(';\s*').str.split('; ',expand=True))

Output:

   id            0              1          2
0   1   mith, john  cameron, jame       None
1   2  guan, brian           None       None
2   3    obs, noah     mumm, erik  lee, matt

Upvotes: 1

Related Questions