Reputation: 39
I´m looking to merge rows by some id but the problem is that I want to keep the unique values from multiple rows as a list while only keeping the duplicates from multiple rows as a single element.
The raw data would look like something like this:
| id | info1 | info2 | info3| info4|
| -- | ---- | ------| -----| -----|
| 1 | 'a' | xxx | yyy | |
| 1 | 'b' | xxx | yyy | |
| 2 | 'c' | mmm | nnn | |
| 3 | 'd' | uuu | | |
| 3 | 'e' | uuu | ooo | |
| 4 | 'f' | xy | | |
| 4 | 'g' | xy | | |
(The blanks represent missing values)
The desired output data would look like:
| id | info1 | info2 | info3| info4 |
| -- | ---- | ------| -----| ------|
| 1 | ['a','b'] | xxx | yyy | |
| 2 | 'c' | mmm | nnn | |
| 3 | ['d','e'] | uuu | ooo | |
| 4 | ['f','g'] | xy | | |
I´m quite new to this. Hopefully I expressed myself clear here.
Upvotes: 0
Views: 1221
Reputation: 7903
Try this:
df.groupby('id').agg(pd.unique).applymap(lambda x: x[0] if len(x)==1 else x)
info1 info2 info3
id
1 ['a', 'b'] xxx yyy
2 'c' mmm nnn
3 ['d', 'e'] uuu ooo
UPDATE
In case of missing values, we need some little changes and more conditions.
(
df.groupby('id')
.agg(lambda x:
list(set([elem for elem in x if elem==elem]))
)
.applymap(lambda x:
x[0] if len(x)==1 else list(x) if len(x)>1 else ''
)
)
Output:
info1 info2 info3 info4
id
1 ['b', 'a'] xxx yyy
2 'c' mmm nnn
3 ['d', 'e'] uuu ooo
4 ['g', 'f'] xy
Upvotes: 2