Reputation: 147
Consider I have a pandas DataFrame like following:
df = pd.DataFrame([['Python','p1,p2,p3'],
['Java','j1,j2,j3'],
['C++','c1,c2,c3']], columns=['name','features'])
So that it looks like:
name features
0 Python p1,p2,p3,p4,p5
1 Java j1,j2,j3
2 C++ c1,c2,c3
I would like to split 'features' column (but only keeping the first 3 features, so 'p1,p2,p3,p4,p5' will become 'p1,p2,p3'), so my final expected DataFrame will be:
name feature1 feature2 feature3
0 Python p1 p2 p3
1 Java j1 j2 j3
2 C++ c1 c2 c3
How should I do it? Thanks.
I searched several SO answers related to split columns, but none of them meet my requirements. I am a new learner of pandas. Feel free to edit the question if it is not well formatted.
Upvotes: 2
Views: 166
Reputation: 862611
You can use join
od new df
created by str.split
of extracted column by pop
, select only first 3
columns by iloc
and last add add_prefix
:
df = df.join(df.pop('features').str.split(',',expand=True).iloc[:, :3].add_prefix('feature'))
print (df)
name feature0 feature1 feature2
0 Python p1 p2 p3
1 Java j1 j2 j3
2 C++ c1 c2 c3
What is same as solution with drop
:
df = df.drop('features', axis=1).join(df['features'].str.split(',', expand=True).iloc[:, :3]
.add_prefix('feature'))
print (df)
name feature0 feature1 feature2
0 Python p1 p2 p3
1 Java j1 j2 j3
2 C++ c1 c2 c3
Also if need count from 1
use rename
:
f = lambda x: 'feature' + str(x + 1)
df = df.join(df.pop('features').str.split(',', expand=True).iloc[:, :3].rename(columns=f))
print (df)
name feature1 feature2 feature3
0 Python p1 p2 p3
1 Java j1 j2 j3
2 C++ c1 c2 c3
Upvotes: 2