Assign subset of values to pandas dataframe with MultiIndex

Question

I have a DataFrame df:

                             **Count**
**Environment** **Type**    
**A**            a           100
                 b           200
                 c           300
                 d           400
                 e           500
                 f           600
**B**            a           1000
                 b           2000
                 c           3000
                 d           4000
                 e           5000
                 f           6000

The df.index spits out the following index:

    MultiIndex(levels=[['A', 'B'], ['a', 'b', 'c', 'd', 'e', 'f']],
               labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1], 
                       [0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5]],
               names=['A', 'B'])

I need to calculate the percentage of Counts per A and B. So I do:

sums = df.groupby(level = 0).sum()
df.loc['A'] = df.loc['A'].apply(lambda x: x/sums.loc['A','Count'])
df.loc['B'] = df.loc['B'].apply(lambda x: x/sums.loc['B','Count'])

However, this results into all values being NaN.

I suspect that the index of df.loc['B'].apply(lambda x: x/sums.loc['B','Count'])is not the same as the index of df, but it should be the same with the part of df that I am selecting.

These by themselves

df.loc['A'].apply(lambda x: x/sums.loc['A','Count'])
df.loc['B'].apply(lambda x: x/sums.loc['B','Count'])

have the values I need, so division works. But, assignment does not.

How do I assign the the result of the abovementioned expression to the part of the dataframe df?

Bharath M Shetty · Accepted Answer

You can simply do df/sums, no need for loop.

Since that you want to assign to a particular part of dataframe you can do it this way. Keep the depth of computed df 1 level higher.

df.loc['A',:] = df.loc['A',:,:].apply(lambda x: x/sums.loc['A','Count'])

Assign subset of values to pandas dataframe with MultiIndex

Answers (2)

Related Questions