Pandas Groupby Divide by Last in Sequence by Unique Key

Question

I have data in a pandas dataframe that looks like this:

In [1]: import pandas as pd

In [2]: toy = pd.DataFrame({'Date':['7/1/2018','7/2/2018','7/3/2018','7/4/2018','7/2/2018','7/3/2018','7/4/2018','7/5/2018'],
                    'A':[1,2,3,4,5,6,7,8],'B':[8,7,6,5,4,3,2,1],'C':[4,7,2,6,4,2,2,1],
                    'key':['key1','key1','key1','key1','key2','key2','key2','key2']})

In [3]: toy
Out[3]:
   A  B  C      Date   key
0  1  8  4  7/1/2018  key1
1  2  7  7  7/2/2018  key1
2  3  6  2  7/3/2018  key1
3  4  5  6  7/4/2018  key1
4  5  4  4  7/2/2018  key2
5  6  3  2  7/3/2018  key2
6  7  2  2  7/4/2018  key2
7  8  1  1  7/5/2018  key2

In [4]: toy.groupby('key')['A'].last()
Out[4]:
key
key1    4
key2    8
Name: A, dtype: int64

For each unique key, I want to return the value in column A for the most recent date. So for key1 the most recent day is 7/4/2018 and the value in column A is 4 and for key2 the most recent day is 7/5/2018 and the value in column A is 8.

Then I want to divide all the data in columns B & C (and in reality many more columns) by 4 if key1 and 8 if key2. For my actual data I have thousands of keys and 100 or so columns I would like to divide by the key-last-column A value. The date ranges are independent for each key.

I know that, assuming the DataFrame is sorted ascending by date, I can get the denominator by using:

toy.groupby('key')['A'].last()

But I can't figure out how to divide the values in each column B & C for each key by this value.

Hopefully I've formulated to problem clearly, happy to clarify as necessary

user3483203 · Accepted Answer

Use transform('last') instead of last, then simply divide:

df[['B', 'C']].div(df.groupby('key').A.transform('last'), axis=0)

       B      C
0  2.000  1.000
1  1.750  1.750
2  1.500  0.500
3  1.250  1.500
4  0.500  0.500
5  0.375  0.250
6  0.250  0.250
7  0.125  0.125

To show why this works:

df.groupby('key').A.transform('last')

0    4
1    4
2    4
3    4
4    8
5    8
6    8
7    8
Name: A, dtype: int64

By using transform('last') instead of last, every element in the series is "transformed" to the last value in each group. Then we can simply divide each row by the equivalent value in this result.

Pandas Groupby Divide by Last in Sequence by Unique Key

Answers (2)

Related Questions