Reputation: 49
Purchase_ID | Item | Date |
---|---|---|
1 | A | 21/3/2020 |
1 | B | 21/3/2020 |
1 | C | 21/3/2020 |
2 | X | 27/3/2020 |
2 | Y | 27/3/2020 |
I would like to transform the table into the output below using pandas/any python modules
Purchase_ID | Item | Date |
---|---|---|
1 | A,B,C | 21/3/2020 |
2 | X,Y | 27/3/2020 |
Upvotes: 1
Views: 42
Reputation: 2605
The following works:
df2 = df.pivot_table(index='Purchase_ID', values=['Item','Date'], aggfunc=set)
Note: you'll get a set, rather than list, for the Items and Dates. This avoids duplicating the dates.
If you want to get rid of the sets, you can convert to strings:
df2['Date'] = df2['Date'].apply(lambda x: list(x)[0])
df2['Item'] = df2['Item'].apply(lambda x: ','.join(list(x))
Then df2 is the DataFrame you're looking for.
out:
Date Item
Purchase_ID
1 21/3/2020 A,B,C
2 27/3/2020 X,Y
Upvotes: 1
Reputation: 24324
try via groupby()
and agg()
:
out=df.groupby('Purchase_ID',as_index=False).agg({'Item':','.join,'Date':'first'})
If needed unique elements then use:
out=df.groupby('Purchase_ID',as_index=False).agg({'Item':lambda x:','.join(set(x)),'Date':'first'})
output of out
:
Purchase_ID Item Date
0 1 A,C,B 21/3/2020
1 2 Y,X 27/3/2020
Upvotes: 1