whamp
whamp

Reputation: 83

Pandas Groupby Divide by Last in Sequence by Unique Key

I have data in a pandas dataframe that looks like this:

In [1]: import pandas as pd

In [2]: toy = pd.DataFrame({'Date':['7/1/2018','7/2/2018','7/3/2018','7/4/2018','7/2/2018','7/3/2018','7/4/2018','7/5/2018'],
                    'A':[1,2,3,4,5,6,7,8],'B':[8,7,6,5,4,3,2,1],'C':[4,7,2,6,4,2,2,1],
                    'key':['key1','key1','key1','key1','key2','key2','key2','key2']})

In [3]: toy
Out[3]:
   A  B  C      Date   key
0  1  8  4  7/1/2018  key1
1  2  7  7  7/2/2018  key1
2  3  6  2  7/3/2018  key1
3  4  5  6  7/4/2018  key1
4  5  4  4  7/2/2018  key2
5  6  3  2  7/3/2018  key2
6  7  2  2  7/4/2018  key2
7  8  1  1  7/5/2018  key2

In [4]: toy.groupby('key')['A'].last()
Out[4]:
key
key1    4
key2    8
Name: A, dtype: int64

For each unique key, I want to return the value in column A for the most recent date. So for key1 the most recent day is 7/4/2018 and the value in column A is 4 and for key2 the most recent day is 7/5/2018 and the value in column A is 8.

Then I want to divide all the data in columns B & C (and in reality many more columns) by 4 if key1 and 8 if key2. For my actual data I have thousands of keys and 100 or so columns I would like to divide by the key-last-column A value. The date ranges are independent for each key.

I know that, assuming the DataFrame is sorted ascending by date, I can get the denominator by using:

toy.groupby('key')['A'].last()

But I can't figure out how to divide the values in each column B & C for each key by this value.

Hopefully I've formulated to problem clearly, happy to clarify as necessary

Upvotes: 2

Views: 133

Answers (2)

user3483203
user3483203

Reputation: 51165

Use transform('last') instead of last, then simply divide:

df[['B', 'C']].div(df.groupby('key').A.transform('last'), axis=0)

       B      C
0  2.000  1.000
1  1.750  1.750
2  1.500  0.500
3  1.250  1.500
4  0.500  0.500
5  0.375  0.250
6  0.250  0.250
7  0.125  0.125

To show why this works:

df.groupby('key').A.transform('last')

0    4
1    4
2    4
3    4
4    8
5    8
6    8
7    8
Name: A, dtype: int64

By using transform('last') instead of last, every element in the series is "transformed" to the last value in each group. Then we can simply divide each row by the equivalent value in this result.

Upvotes: 3

NAGA
NAGA

Reputation: 338

I am not sure if this is the correct way to do this, but this should help.

toy_gby = toy.groupby('key')

keys_values = toy_gby['A'].last()

keys_df = pd.DataFrame(keys_values)

keys_df = keys_df.reset_index()

keys_df = keys_df.rename(columns = {'A':'new_column'})

merged_df = pd.merge(toy,keys_df, on =['key'])

merged_df['B'] = merged_df['B']/merged_df['new_column']
merged_df['C'] = merged_df['C']/merged_df['new_column']

I am trying to create a new column which has the value to divide the other columns you wanted to divide.

Upvotes: 0

Related Questions