Reputation: 61
I have a df that currently has 4 columns. The first column is a combination of 3 items delimited by _. For example: 44_title_iphone6_32GB
What I want is 44
title
iphone6_32gb
in their own new columns. However, I can't do a simple df.split on _ because then it will separate the iphone6 and 32gb into two. How can I accomplish this? The other issue is that the last of the 3 items isn't always consistent in length, eg. 44_title_iphone5_32gb_white
So regardless I still want number,title,description in each new column.
Help?
Upvotes: 0
Views: 902
Reputation: 352989
split
accepts an n
parameter for the number of splits:
>>> df = pd.DataFrame({"stuff": ["44_title_iphone6_32GB", "44_title_iphone5_32gb_white"]})
>>> df
stuff
0 44_title_iphone6_32GB
1 44_title_iphone5_32gb_white
>>> df["stuff"].str.split("_", 2)
0 [44, title, iphone6_32GB]
1 [44, title, iphone5_32gb_white]
Name: stuff, dtype: object
And then if we .apply(pd.Series)
, we can promote these to columns:
>>> df["stuff"].str.split("_", 2).apply(pd.Series)
0 1 2
0 44 title iphone6_32GB
1 44 title iphone5_32gb_white
UPDATE:
Note that these days you can use expand=True
instead of apply(pd.Series)
:
>>> df["stuff"].str.split("_", 2, expand=True)
0 1 2
0 44 title iphone6_32GB
1 44 title iphone5_32gb_white
Upvotes: 1