Reputation: 437
I have a problem related to this question: Aggregate column values in pandas GroupBy as a dict
My input data has the following columns:
For instance, the input would have the following format
language, product_id, shelf_id, rank, fiction, pages
English, 742005, 4560, 10.2, 1.0, 456
English, 6000075389352, 4560, 49, 1.0, 234
French, 899883993, 4560, 32, 0.0, 125
French, 731317391, 7868, 81, 1.0, 576
French, 235678655, 7868, 12, 1.0, 235
I would like to do "groupby" on language & shelf_id columns and get the list of remaining attributes for each product_id. The expected output should have the following format:
Language, shelf_id, {product_id: [rank, fiction, pages]}
for each grouped record.
For the given input, my DESIRED OUTPUT would be the following:
language, shelf_id, mapping
English, 4560, {742005: [10.2, 1.0, 456], 6000075389352: [49, 1.0, 234]}
French, 4560, {899883993: [32, 0.0, 125]}
French, 7868, {731317391: [81, 1.0, 576], 235678655: [12, 1.0, 235]}
The solution provided in the above mentioned post, solves the problem nicely if only one column needs to be considered (in the resulting dictionary):
df = pd.read_csv('file.csv', header=None)
df.columns = ['Lang', 'product_id', 'shelf_id', 'rank_id']
(df.groupby(['Lang', 'shelf_id'], as_index=False)
.apply(lambda x: dict(zip(x['product_id'], x['rank_id'])))
.reset_index(name='mapping'))
which would yield:
Lang shelf_id mapping
0 English 4560 {742005: 10.2, 6000075389352: 49.0}
1 French 4560 {899883993: 32.0}
2 French 7868 {731317391: 81.0, 235678655: 12.0}
Could anyone help me adapt this solution to my case? Any suggestions would be appreciated.
Upvotes: 1
Views: 559
Reputation: 71689
Idea is to create a new series s
containing the items as tuples
where the first item in tuple is product_id
and second item is the list containing corresponding values from columns rank
, fiction
and pages
, next we use Series.groupby
to group the series s
on language
and shelf_id
and aggregate the data as dictionary:
s = pd.Series([(k, v) for k, *v in zip(df['product_id'],
df['rank'], df['fiction'], df['pages'])])
df = s.groupby([df['language'], df['shelf_id']]).agg(
lambda d: dict(d.tolist())).reset_index(name='mapping')
Details:
#print(s)
0 (742005, [10.2, 1.0, 456]) # --> product_id: [rank, fiction, pages]
1 (6000075389352, [49.0, 1.0, 234])
2 (899883993, [32.0, 0.0, 125])
3 (731317391, [81.0, 1.0, 576])
4 (235678655, [12.0, 1.0, 235])
dtype: object
# print(df)
language shelf_id mapping
0 English 4560 {742005: [10.2, 1.0, 456], 6000075389352: [49.0, 1.0, 234]}
1 French 4560 {899883993: [32.0, 0.0, 125]}
2 French 7868 {731317391: [81.0, 1.0, 576], 235678655: [12.0, 1.0, 235]}
Upvotes: 1