How to split one pandas dataframe column which contains combined values to several columns

Question

Consider I have a pandas DataFrame like following:

df = pd.DataFrame([['Python','p1,p2,p3'],
                   ['Java','j1,j2,j3'],
                   ['C++','c1,c2,c3']], columns=['name','features'])

So that it looks like:

         name  features
    0  Python  p1,p2,p3,p4,p5
    1    Java  j1,j2,j3
    2     C++  c1,c2,c3

I would like to split 'features' column (but only keeping the first 3 features, so 'p1,p2,p3,p4,p5' will become 'p1,p2,p3'), so my final expected DataFrame will be:

          name    feature1 feature2  feature3
    0     Python  p1       p2        p3
    1     Java    j1       j2        j3
    2     C++     c1       c2        c3

How should I do it? Thanks.

I searched several SO answers related to split columns, but none of them meet my requirements. I am a new learner of pandas. Feel free to edit the question if it is not well formatted.

jezrael · Accepted Answer

You can use join od new df created by str.split of extracted column by pop, select only first 3 columns by iloc and last add add_prefix:

df = df.join(df.pop('features').str.split(',',expand=True).iloc[:, :3].add_prefix('feature'))
print (df)
     name feature0 feature1 feature2
0  Python       p1       p2       p3
1    Java       j1       j2       j3
2     C++       c1       c2       c3

What is same as solution with drop:

df = df.drop('features', axis=1).join(df['features'].str.split(',', expand=True).iloc[:, :3]
                                                    .add_prefix('feature'))
print (df)
     name feature0 feature1 feature2
0  Python       p1       p2       p3
1    Java       j1       j2       j3
2     C++       c1       c2       c3

Also if need count from 1 use rename:

f = lambda x: 'feature' + str(x + 1)
df = df.join(df.pop('features').str.split(',', expand=True).iloc[:, :3].rename(columns=f))
print (df)

     name feature1 feature2 feature3
0  Python       p1       p2       p3
1    Java       j1       j2       j3
2     C++       c1       c2       c3

How to split one pandas dataframe column which contains combined values to several columns

Answers (1)

Related Questions