Neil
Neil

Reputation: 8247

how to drop duplicates and dcast pandas dataframe with comma separated values

I have following dataframe in pandas

  tank      nozzle
  1         1
  1         1
  1         2
  1         3
  1         1
  2         2
  2         1
  2         1
  2         2
  2         2
  2         1
  2         3
  2         2

I want following output

  tank      nozzle
  1         1,2,3
  2         1,2,3  

Nozzle should be unique nos. How can I do it in pandas?

Upvotes: 1

Views: 90

Answers (1)

jezrael
jezrael

Reputation: 863206

Convert column nozzle to strings, drop_duplicates and use GroupBy.apply with join:

df['nozzle'] = df['nozzle'].astype(str)
df1 = df.drop_duplicates().groupby('tank')['nozzle'].apply(','.join).reset_index()
print (df1)
   tank nozzle
0     1  1,2,3
1     2  2,1,3

Alternative solution with lambda function:

df1 = (df.drop_duplicates()
       .groupby('tank')['nozzle']
       .apply(lambda x: ','.join(x.astype(str)))
       .reset_index())

For lists:

df1 = df.drop_duplicates().groupby('tank')['nozzle'].apply(list).reset_index()
print (df)
   tank     nozzle
0     1  [1, 2, 3]
1     2  [2, 1, 3]

Upvotes: 3

Related Questions