Reputation: 391
I searched for my problem and it was similar to this question. However, it did not give the expected results, so I am still stuck. I have a list like this:
import pandas as pd
l=[[1,'John','Wed',28],[1,'John','Fri',30],[2,'Alex','Fri',40],[2,'Alex','Fri',60]]
I did
o=pd.DataFrame(l,columns=['id','name','day','marks'])
r = o.groupby(['id','name','day']).marks.mean().reset_index().values.tolist()
now what i got looks like this
[[1,'John','Wed',28],[1,'John','Fri',30],[2,'Alex','Fri',50]]
can somebody please help me to get something like
[[1,'John',[['Wed',28],['Fri',30]]], [2,'Alex',[['Fri',50]]]]
Upvotes: 2
Views: 45
Reputation: 28709
You can use defaultdict
to get the data from r
into your desired output :
from collections import defaultdict
d = defaultdict(list)
for number, name, day, day_number in r:
d[(number, name)].append([day, day_number])
#pull data into list form with a list comprehension
[[*key, value] for key, value in d.items()]
[[1, 'John', [['Fri', 30], ['Wed', 28]]], [2, 'Alex', [['Fri', 50]]]]
Alternatively, you could run the entire process in plain Python, caveat though is that you have to use defaultdict
twice, which makes sense since in a way defaultdict
is grouping your data :
from collections import defaultdict
from statistics import mean
d = defaultdict(list)
for number, name, day, day_number in l:
d[(number, name, day)].append(day_number)
d = {key:mean(value) for key, value in d.items()}
d = [[*key[:2], [*key[2:],value]] for key, value in d.items()]
box = defaultdict(list)
for number, name, [day, day_number] in d:
box[(number, name)].append([day, day_number])
[[*key, value] for key, value in box.items()]
[[1, 'John', [['Wed', 28], ['Fri', 30]]], [2, 'Alex', [['Fri', 50]]]]
Upvotes: 2
Reputation: 21729
You can do :
# get mean marks
df['mean_marks'] = df.groupby(['id', 'name', 'day'])['marks'].transform('mean')
# create list of day, marks
df['mark_list'] = df[['day','mean_marks']].agg(list, 1)
# aggregate
df = (df
.groupby(['id', 'name'])['mark_list']
.apply(list)
.apply(lambda x: [list(y) for y in set([tuple(j) for j in x])])
.reset_index())
print(df)
id name mark_list
0 1 John [[Wed, 28], [Fri, 30]]
1 2 Alex [[Fri, 50]]
Upvotes: 2