Pandas DataFrame: Merge rows and keep only unique values from multiple rows as a list

Question

I´m looking to merge rows by some id but the problem is that I want to keep the unique values from multiple rows as a list while only keeping the duplicates from multiple rows as a single element.

The raw data would look like something like this:

| id | info1 | info2 | info3| info4|
| -- | ----  | ------| -----| -----|
| 1  | 'a'   |  xxx  |  yyy |      |
| 1  | 'b'   |  xxx  |  yyy |      |
| 2  | 'c'   |  mmm  |  nnn |      |
| 3  |  'd'  |  uuu  |      |      |
| 3  |  'e'  |  uuu  |  ooo |      |
| 4  |  'f'  |  xy   |      |      |
| 4  |  'g'  |  xy   |      |      |

(The blanks represent missing values)

The desired output data would look like:

| id | info1     | info2 | info3| info4 |
| -- | ----      | ------| -----| ------|
| 1  | ['a','b'] |  xxx  |  yyy |       |
| 2  | 'c'       |  mmm  |  nnn |       |
| 3  | ['d','e'] |  uuu  |  ooo |       |
| 4  | ['f','g'] |  xy   |      |       |

I´m quite new to this. Hopefully I expressed myself clear here.

Rabinzel · Accepted Answer

Try this:

df.groupby('id').agg(pd.unique).applymap(lambda x: x[0] if len(x)==1 else x)

         info1 info2 info3
id                        
1   ['a', 'b']   xxx   yyy
2          'c'   mmm   nnn
3   ['d', 'e']   uuu   ooo

UPDATE

In case of missing values, we need some little changes and more conditions.

(
    df.groupby('id')
    .agg(lambda x: 
         list(set([elem for elem in x if elem==elem]))
        )
    .applymap(lambda x: 
              x[0] if len(x)==1 else list(x) if len(x)>1 else ''
             )
)

Output:

         info1 info2 info3 info4
id                              
1   ['b', 'a']   xxx   yyy      
2          'c'   mmm   nnn      
3   ['d', 'e']   uuu   ooo      
4   ['g', 'f']    xy

Pandas DataFrame: Merge rows and keep only unique values from multiple rows as a list

Answers (1)

Related Questions