Mewtwo
Mewtwo

Reputation: 1311

grouping rows in a list of lists in pandas

I have a dataframe that looks like this:

ID Description
1  A
1  B
1  C
2  A
2  C
3  A

I would like to group by the ID column and get the description as a list of list like this:

ID Description
1  [["A"],["B"],["C"]]
2  [["A"],["C"]]
3  [["A"]]

The df.groupby('ID')['Description'].apply(list) but this create only the "first level" of lists.

Upvotes: 2

Views: 903

Answers (2)

jezrael
jezrael

Reputation: 862481

You need create inner lists:

print (df)
   ID Description
0   1         Aas
1   1           B
2   1           C
3   2           A
4   2           C
5   3           A

df = df['Description'].apply(lambda x: [x]).groupby(df['ID']).apply(list).reset_index()

Another solution similar like @jp_data_analysis with one apply:

df = df.groupby('ID')['Description'].apply(lambda x: [[y] for y in x]).reset_index()

And pure python solution:

a = list(zip(df['ID'], df['Description']))
d = {}
for k, v in a:
    d.setdefault(k, []).append([v])
df = pd.DataFrame({'ID':list(d.keys()), 'Description':list(d.values())}, 
                   columns=['ID','Description'])

print (df)
   ID        Description
0   1  [[Aas], [B], [C]]
1   2         [[A], [C]]
2   3              [[A]]

Upvotes: 2

jpp
jpp

Reputation: 164623

This is slightly different to @jezrael in that the listifying of strings is done via map. In addition call reset_index() adds "Description" explicitly to output.

import pandas as pd

df = pd.DataFrame([[1, 'A'], [1, 'B'], [1, 'C'], [2, 'A'], [2, 'C'], [3, 'A']], columns=['ID', 'Description'])

df.groupby('ID')['Description'].apply(list).apply(lambda x: list(map(list, x))).reset_index()

# ID Description
# 1 [[A], [B], [C]] 
# 2 [[A], [C]] 
# 3 [[A]] 

Upvotes: 2

Related Questions