Difference between rows as a list within a column

Question

I have a (messy) pd.df like this,

Number    Day        Colour   City
123       Monday     Red      London
234       Monday     Blue     Paris
234       Wednesday  Yellow   Paris
345       Tuesday    Green    Berlin

Is there an elegant way of making a new df which has the same columns but shows the differences between two rows of like ['Number'] as a list within a columns. So the desired output is something like this.

Number    Day                Colour        City
123       Monday             Red           London
234       Monday, Wednesday  Blue, Yellow  Paris
345       Tuesday            Green         Berlin

Even a point in the right direction regarding which functions I should be looking at would be helpful. The values I am interested in will always be a string rather than a float or integer. Many thanks.

piRSquared · Accepted Answer

`groupby` and `agg`

NOTE: {*s} returns a set of all the things in s. This makes sure that we are only joining unique things and explains why we get Paris and not Paris, Paris

df.groupby('Number').agg(lambda s: ', '.join({*s}))

                      Day        Colour    City
Number                                         
123                Monday           Red  London
234     Wednesday, Monday  Blue, Yellow   Paris
345               Tuesday         Green  Berlin

OR

df.groupby('Number', as_index=False).agg(lambda s: ', '.join({*s}))

   Number                Day        Colour    City
0     123             Monday           Red  London
1     234  Wednesday, Monday  Blue, Yellow   Paris
2     345            Tuesday         Green  Berlin

Difference between rows as a list within a column

Answers (2)

`groupby` and `agg`

Related Questions

Difference between rows as a list within a column

Answers (2)

groupby and agg

Related Questions

`groupby` and `agg`