Vivek Anand
Vivek Anand

Reputation: 391

how to get list of list after pandas groupby

I searched for my problem and it was similar to this question. However, it did not give the expected results, so I am still stuck. I have a list like this:

import pandas as pd

l=[[1,'John','Wed',28],[1,'John','Fri',30],[2,'Alex','Fri',40],[2,'Alex','Fri',60]]

I did

o=pd.DataFrame(l,columns=['id','name','day','marks'])
r = o.groupby(['id','name','day']).marks.mean().reset_index().values.tolist()

now what i got looks like this

[[1,'John','Wed',28],[1,'John','Fri',30],[2,'Alex','Fri',50]]

can somebody please help me to get something like

[[1,'John',[['Wed',28],['Fri',30]]], [2,'Alex',[['Fri',50]]]]

Upvotes: 2

Views: 45

Answers (2)

sammywemmy
sammywemmy

Reputation: 28709

You can use defaultdict to get the data from r into your desired output :

from collections import defaultdict

d = defaultdict(list)

for number, name, day, day_number in r:
    d[(number, name)].append([day, day_number])

#pull data into list form with a list comprehension
[[*key, value] for key, value in d.items()]

[[1, 'John', [['Fri', 30], ['Wed', 28]]], [2, 'Alex', [['Fri', 50]]]]

Alternatively, you could run the entire process in plain Python, caveat though is that you have to use defaultdict twice, which makes sense since in a way defaultdict is grouping your data :

from collections import defaultdict
from statistics import mean

d = defaultdict(list)

for number, name, day, day_number in l:
    d[(number, name, day)].append(day_number)

d = {key:mean(value) for key, value in d.items()}

d = [[*key[:2], [*key[2:],value]] for key, value in d.items()]

box = defaultdict(list)
for number, name, [day, day_number] in d:
    box[(number, name)].append([day, day_number])
    
[[*key, value] for key, value in box.items()]

[[1, 'John', [['Wed', 28], ['Fri', 30]]], [2, 'Alex', [['Fri', 50]]]]

Upvotes: 2

YOLO
YOLO

Reputation: 21729

You can do :

# get mean marks
df['mean_marks'] = df.groupby(['id', 'name', 'day'])['marks'].transform('mean')

# create list of day, marks
df['mark_list'] = df[['day','mean_marks']].agg(list, 1)

# aggregate
df = (df
      .groupby(['id', 'name'])['mark_list']
      .apply(list)
      .apply(lambda x: [list(y) for y in set([tuple(j) for j in x])])
      .reset_index())

print(df)

   id  name               mark_list
0   1  John  [[Wed, 28], [Fri, 30]]
1   2  Alex             [[Fri, 50]]

Upvotes: 2

Related Questions