Reputation: 83
I have data in a pandas dataframe that looks like this:
In [1]: import pandas as pd
In [2]: toy = pd.DataFrame({'Date':['7/1/2018','7/2/2018','7/3/2018','7/4/2018','7/2/2018','7/3/2018','7/4/2018','7/5/2018'],
'A':[1,2,3,4,5,6,7,8],'B':[8,7,6,5,4,3,2,1],'C':[4,7,2,6,4,2,2,1],
'key':['key1','key1','key1','key1','key2','key2','key2','key2']})
In [3]: toy
Out[3]:
A B C Date key
0 1 8 4 7/1/2018 key1
1 2 7 7 7/2/2018 key1
2 3 6 2 7/3/2018 key1
3 4 5 6 7/4/2018 key1
4 5 4 4 7/2/2018 key2
5 6 3 2 7/3/2018 key2
6 7 2 2 7/4/2018 key2
7 8 1 1 7/5/2018 key2
In [4]: toy.groupby('key')['A'].last()
Out[4]:
key
key1 4
key2 8
Name: A, dtype: int64
For each unique key, I want to return the value in column A for the most recent date. So for key1 the most recent day is 7/4/2018 and the value in column A is 4 and for key2 the most recent day is 7/5/2018 and the value in column A is 8.
Then I want to divide all the data in columns B & C (and in reality many more columns) by 4 if key1 and 8 if key2. For my actual data I have thousands of keys and 100 or so columns I would like to divide by the key-last-column A value. The date ranges are independent for each key.
I know that, assuming the DataFrame is sorted ascending by date, I can get the denominator by using:
toy.groupby('key')['A'].last()
But I can't figure out how to divide the values in each column B & C for each key by this value.
Hopefully I've formulated to problem clearly, happy to clarify as necessary
Upvotes: 2
Views: 133
Reputation: 51165
Use transform('last')
instead of last
, then simply divide:
df[['B', 'C']].div(df.groupby('key').A.transform('last'), axis=0)
B C
0 2.000 1.000
1 1.750 1.750
2 1.500 0.500
3 1.250 1.500
4 0.500 0.500
5 0.375 0.250
6 0.250 0.250
7 0.125 0.125
To show why this works:
df.groupby('key').A.transform('last')
0 4
1 4
2 4
3 4
4 8
5 8
6 8
7 8
Name: A, dtype: int64
By using transform('last')
instead of last
, every element in the series is "transformed" to the last value in each group. Then we can simply divide each row by the equivalent value in this result.
Upvotes: 3
Reputation: 338
I am not sure if this is the correct way to do this, but this should help.
toy_gby = toy.groupby('key')
keys_values = toy_gby['A'].last()
keys_df = pd.DataFrame(keys_values)
keys_df = keys_df.reset_index()
keys_df = keys_df.rename(columns = {'A':'new_column'})
merged_df = pd.merge(toy,keys_df, on =['key'])
merged_df['B'] = merged_df['B']/merged_df['new_column']
merged_df['C'] = merged_df['C']/merged_df['new_column']
I am trying to create a new column which has the value to divide the other columns you wanted to divide.
Upvotes: 0