N_Spen
N_Spen

Reputation: 61

Pandas Split DataFrame

I have a df that currently has 4 columns. The first column is a combination of 3 items delimited by _. For example: 44_title_iphone6_32GB What I want is 44 title iphone6_32gb in their own new columns. However, I can't do a simple df.split on _ because then it will separate the iphone6 and 32gb into two. How can I accomplish this? The other issue is that the last of the 3 items isn't always consistent in length, eg. 44_title_iphone5_32gb_white So regardless I still want number,title,description in each new column.

Help?

Upvotes: 0

Views: 902

Answers (1)

DSM
DSM

Reputation: 352989

split accepts an n parameter for the number of splits:

>>> df = pd.DataFrame({"stuff": ["44_title_iphone6_32GB", "44_title_iphone5_32gb_white"]})
>>> df
                         stuff
0        44_title_iphone6_32GB
1  44_title_iphone5_32gb_white
>>> df["stuff"].str.split("_", 2)
0          [44, title, iphone6_32GB]
1    [44, title, iphone5_32gb_white]
Name: stuff, dtype: object

And then if we .apply(pd.Series), we can promote these to columns:

>>> df["stuff"].str.split("_", 2).apply(pd.Series)
    0      1                   2
0  44  title        iphone6_32GB
1  44  title  iphone5_32gb_white

UPDATE:

Note that these days you can use expand=True instead of apply(pd.Series):

>>> df["stuff"].str.split("_", 2, expand=True)
    0      1                   2
0  44  title        iphone6_32GB
1  44  title  iphone5_32gb_white

Upvotes: 1

Related Questions