Use Look Up Table with Dictionary of Dataframes

Question

I have a dictionary of single column dataframes (Di_G). I would like to use the index of each dataframe in Di_G to look up a value from another dictionary (Di_A). I would then like to divide each value of each single column dataframe in Di_G by its corresponding value from Di_A.

import pandas as pd

# Data
df_1 = pd.DataFrame({'Box' : [1006,1006,1006,1006,1006,1006,1007,1007,1007,1007,1008,1008,1008,1009,1009,1010,1011,1011,1012,1013],
                     'Item': [  40,  41,  42,  43,  44,  45,  40,  43,  44,  45,  43,  44,  45,  40,  41,  40,  44,  45,  44,  45]})


df_2 = pd.DataFrame({'Box' : [1006,1007,1008,1009,1010,1011,1012,1013,1014],
                     'Type': [ 103, 101, 102, 102, 102, 103, 103, 103, 103]})

# Join
df_J = df_1 .set_index('Box') .join(df_2 .set_index('Box', 'outer'))

# Count how many Boxes contain each Item - Count Boxes ( Item )
df_G = df_J.groupby('Item').size()
Di_A = df_G.to_dict()

# Group the Boxes by Type
Ma_G = df_J .groupby('Type')

Di_1 = {}
for name, group in Ma_G:
    Di_1[str(name)] = group

# Count how many Boxes of each Type contain each Item - Count Boxes ( Item │ Type )
Di_G = {}
for k in Di_1:
    Di_G[k] = Di_1[k].groupby('Item').size()

I tried doing it like this:

# Pr ( Type │ Item ) = Count Boxes ( Item │ Type ) / Count Boxes ( Item )
for k in Di_G:
    Di_G[k]['Pr'] = Di_G[k]['0'] / Di_G[k]['Index'].map(Di_A)

I got "KeyError: '0'".

I tried changing the column names in Di_G & Di_A but I had difficulty doing this.

BENY · Accepted Answer

I think you just need transform

df_J.groupby(['Item','Type']).Item.transform('count')/df_J.groupby('Item').Item.transform('count')
Out[298]: 
Box
1006    0.250000
1006    0.500000
1006    1.000000
1006    0.333333
1006    0.600000
1006    0.600000
1007    0.250000
1007    0.333333
1007    0.200000
1007    0.200000
1008    0.333333
1008    0.200000
1008    0.200000
1009    0.500000
1009    0.500000
1010    0.500000
1011    0.600000
1011    0.600000
1012    0.600000
1013    0.600000
Name: Item, dtype: float64

Also to better match your expected output

G=df_J.groupby(['Item','Type']).size()

G.div(G.sum(level=0),level=0)
Out[303]: 
Item  Type
40    101     0.250000
      102     0.500000
      103     0.250000
41    102     0.500000
      103     0.500000
42    103     1.000000
43    101     0.333333
      102     0.333333
      103     0.333333
44    101     0.200000
      102     0.200000
      103     0.600000
45    101     0.200000
      102     0.200000
      103     0.600000
dtype: float64

Use Look Up Table with Dictionary of Dataframes

Answers (1)

Related Questions