Reputation: 903
I would like to group the following Pandas DataFrame by ID
column:
|----+----------------------------------------+-----------------|
| ID | Elements | Colors |
|----+----------------------------------------+-----------------|
| A | '1st element, 2d element, 3d element' | 'red, blue' |
| A | '2d element, 4th element' | 'blue, green' |
| B | '3d element, 5th element, 6th element' | 'white, purple' |
| B | '3d element, 5th element, 7th element' | 'white, teal' |
| B | '3d element, 5th element, 8th element' | 'white, black' |
|----+----------------------------------------+-----------------|
In order to obtain the following Pandas DataFrame:
|----+-----------------------------------------------------------------+------------------------------|
| ID | Elements | Colors |
|----+-----------------------------------------------------------------+------------------------------|
| A | '1st element, 2d element, 3d element, 4th element' | 'red, blue, green' |
| B | '3d element, 5h element, 6th element, 7th element, 8th element' | 'white, purple, teal, black' |
|----+-----------------------------------------------------------------+------------------------------|
Upvotes: 0
Views: 57
Reputation: 2345
import pandas as pd
from itertools import chain
data = {'ID': ['A', 'A', 'B', 'B', 'B'],
'Elements': ['1st element, 2d element, 3d element', '2d element, 4th element', '3d element, 5th element, 6th element', '3d element, 5th element, 7th element', '3d element, 5th element, 8th element'],
'Colors': ['red, blue', 'blue, green', 'white, purple', 'white, teal', 'white, black']}
df = pd.DataFrame(data)
def comma_split(s):
return s.split(", ")
def uniques(lists):
return ", ".join(set(chain(*map(comma_split, lists))))
df.groupby('ID').aggregate(uniques)
Elements Colors
ID
A 1st element, 4th element, 3d element, 2d element green, red, blue
B 5th element, 8th element, 3d element, 7th elem... purple, white, black, teal
Upvotes: 1