Reputation: 896
This is the question I had during the interview in the past.
We have the input data having the following columns:
language, product id, shelf id, rank
For instance, the input would have the following format
English, 742005, 4560, 10.2
English, 6000075389352, 4560, 49
French, 899883993, 4560, 32
French, 731317391, 7868, 81
we would like to do "group by" operation on language, shelf id columns and sort the list of products based on sort desc on "rank" attribute, which would result in the output having the following format:
Language, shelf_id, {product_id:rank1, product_id:rank2 ....}
for each record.
For the given input, the output would be the following:
English, 4560, {6000075389352:49, 742005:10.2}
French, 4560, 899883993:32
French, 7868, 731317391:81
I solved this problem by making a dictionary with the key (which is created by combining the language and shelf id) and inserting the product id, rank for each of the key.
My method worked, but it looks like there's an easier way of doing it using the python pandas library. I've read some references, but I'm still not sure if there's a superior method to what I've done (solving the problem by creating the key using language, shelf id and dictionary having that key)
Any help would be greatly appreciated.
Upvotes: 6
Views: 6701
Reputation: 402293
Setup
df = pd.read_csv('file.csv', header=None)
df.columns = ['Lang', 'product_id', 'shelf_id', 'rank_id']
df
Lang product_id shelf_id rank_id
0 English 742005 4560 10.2
1 English 6000075389352 4560 49.0
2 French 899883993 4560 32.0
3 French 731317391 7868 81.0
You can use df.groupby
to group by Lang
and shelf_id
. Then use df.apply
to get a dictionary of {productid : rankid}
:
(df.groupby(['Lang', 'shelf_id'], as_index=False)
.apply(lambda x: dict(zip(x['product_id'], x['rank_id'])))
.reset_index(name='mapping'))
Lang shelf_id mapping
0 English 4560 {6000075389352: 49.0, 742005: 10.2}
1 French 4560 {899883993: 32.0}
2 French 7868 {731317391: 81.0}
Upvotes: 8