Reputation: 91
I have a (messy) pd.df like this,
Number Day Colour City
123 Monday Red London
234 Monday Blue Paris
234 Wednesday Yellow Paris
345 Tuesday Green Berlin
Is there an elegant way of making a new df which has the same columns but shows the differences between two rows of like ['Number'] as a list within a columns. So the desired output is something like this.
Number Day Colour City
123 Monday Red London
234 Monday, Wednesday Blue, Yellow Paris
345 Tuesday Green Berlin
Even a point in the right direction regarding which functions I should be looking at would be helpful. The values I am interested in will always be a string rather than a float or integer. Many thanks.
Upvotes: 2
Views: 72
Reputation: 294318
groupby
and agg
NOTE: {*s}
returns a set
of all the things in s
. This makes sure that we are only joining unique things and explains why we get Paris
and not Paris, Paris
df.groupby('Number').agg(lambda s: ', '.join({*s}))
Day Colour City
Number
123 Monday Red London
234 Wednesday, Monday Blue, Yellow Paris
345 Tuesday Green Berlin
OR
df.groupby('Number', as_index=False).agg(lambda s: ', '.join({*s}))
Number Day Colour City
0 123 Monday Red London
1 234 Wednesday, Monday Blue, Yellow Paris
2 345 Tuesday Green Berlin
Upvotes: 2
Reputation: 21719
Here's a way to do it:
df = (df
.groupby('Number')
.agg(','.join)
.applymap(lambda x: x.split(',') if ',' in x else x)
.reset_index())
Number Day Colour City
0 123 Monday Red London
1 234 [Monday, Wednesday] [Blue, Yellow] [Paris, Paris]
2 345 Tuesday Green Berlin
Upvotes: 0