Aggregate column values in pandas GroupBy as a dict

Question

This is the question I had during the interview in the past.

We have the input data having the following columns:

language, product id, shelf id, rank

For instance, the input would have the following format

English, 742005, 4560, 10.2 
English, 6000075389352, 4560, 49
French, 899883993, 4560, 32
French, 731317391, 7868, 81

we would like to do "group by" operation on language, shelf id columns and sort the list of products based on sort desc on "rank" attribute, which would result in the output having the following format:

Language, shelf_id, {product_id:rank1, product_id:rank2 ....}

for each record.

For the given input, the output would be the following:

English, 4560, {6000075389352:49, 742005:10.2}
French, 4560, 899883993:32
French, 7868, 731317391:81

I solved this problem by making a dictionary with the key (which is created by combining the language and shelf id) and inserting the product id, rank for each of the key.

My method worked, but it looks like there's an easier way of doing it using the python pandas library. I've read some references, but I'm still not sure if there's a superior method to what I've done (solving the problem by creating the key using language, shelf id and dictionary having that key)

Any help would be greatly appreciated.

cs95 · Accepted Answer

Setup

df = pd.read_csv('file.csv', header=None)  
df.columns = ['Lang', 'product_id', 'shelf_id', 'rank_id']    

df
      Lang     product_id  shelf_id  rank_id
0  English         742005      4560     10.2
1  English  6000075389352      4560     49.0
2   French      899883993      4560     32.0
3   French      731317391      7868     81.0

You can use df.groupby to group by Lang and shelf_id. Then use df.apply to get a dictionary of {productid : rankid}:

(df.groupby(['Lang', 'shelf_id'], as_index=False)
   .apply(lambda x: dict(zip(x['product_id'], x['rank_id'])))
   .reset_index(name='mapping'))

      Lang  shelf_id                              mapping
0  English      4560  {6000075389352: 49.0, 742005: 10.2}
1   French      4560                    {899883993: 32.0}
2   French      7868                    {731317391: 81.0}

Aggregate column values in pandas GroupBy as a dict

Answers (1)

Related Questions