Reputation: 47
I need to do a simple calculation on values in a dataframe, but I need some column transposed first. Once they are transposed I want to take the most recent amount / 2nd most recent amount and then the binary result if it less than or equal to .5
By most recent I mean most recent to the date in the Date 2 column
Have This
| Name | Amount | Date 1 | Date 2 |
| -----| ---- |------------------------|------------|
| Jim | 100 | 2021-06-10 | 2021-06-15 |
| Jim | 200 | 2021-05-11 | 2021-06-15 |
| Jim | 150 | 2021-03-5 | 2021-06-15 |
| Bob | 350 | 2022-06-10 | 2022-08-30 |
| Bob | 300 | 2022-08-12 | 2022-08-30 |
| Bob | 400 | 2021-07-6 | 2022-08-30 |
I Want this
| Name | Amount | Date 2| Most Recent Amount(MRA) | 2nd Most Recent Amount(2MRA) | MRA / 2MRA| Less than or equal to .5 |
| -----| -------|------------------------|----------------|--------------------|-------------|--------------------------|
| Jim | 100 | 2021-06-15 | 100 | 200 | .5 | 1 |
| Bob | 300 | 2022-08-30 | 300 | 350 | .85 | 0 |
Upvotes: 0
Views: 68
Reputation: 286
This is the original dataframe.
df = pd.DataFrame({'Name':['Jim','Jim','Jim','Bob','Bob','Bob'],
'Amount':[100,200,150,350,300,400],
'Date 1':['2021-06-10','2021-05-11','2021-03-05','2022-06-10','2022-08-12','2021-07-06'],
'Date 2':['2021-06-15','2021-06-15','2021-06-15','2022-08-30','2022-08-30','2022-08-30']
})
And this is the results.
# here we take the gropby of the 'Name' column
g = df.sort_values('Date 1', ascending=False).groupby(['Name'])
# then we use the agg function to get the first of 'Date 2' and 'Amount' columns
# and then rename result of the 'Amount' column to 'MRA'
first = g.agg({'Date 2':'first','Amount':'first'}).rename(columns={'Amount':'MRA'}).reset_index()
# Similarly, we take the second values by applying a lambda function
second = g.agg({'Date 2':'first','Amount':lambda t: t.iloc[1]}).rename(columns={'Amount':'2MRA'}).reset_index()
df_T = pd.merge(first, second, on=['Name','Date 2'], how='left')
# then we use this function to add two desired columns
def operator(x):
return x['MRA']/x['2MRA'], [1 if x['MRA']/x['2MRA']<=.5 else 0][0]
# we apply the operator function to add 'MRA/2MRA' and 'Less than or equal to .5' columns
df_T['MRA/2MRA'], df_T['Less than or equal to .5'] = zip(*df_T.apply(operator, axis=1))
Hope this helps. :)
Upvotes: 2
Reputation: 9379
One way to do what you've asked is:
df = ( df[df['Date 1'] <= df['Date 2']]
.groupby('Name', sort=False)['Date 1'].nlargest(2)
.reset_index(level=0)
.assign(**{
'Amount': df.Amount,
'Date 2': df['Date 2'],
'recency': ['MRA','MRA2']*len(set(df.Name.tolist()))
})
.pivot(index=['Name','Date 2'], columns='recency', values='Amount')
.reset_index().rename_axis(columns=None) )
df = df.assign(**{'Amount':df.MRA, 'MRA / MRA2': df.MRA/df.MRA2})
df = df.assign(**{'Less than or equal to .5': (df['MRA / MRA2'] <= 0.5).astype(int)})
df = pd.concat([df[['Name', 'Amount']], df.drop(columns=['Name', 'Amount'])], axis=1)
Input:
Name Amount Date 1 Date 2
0 Jim 100 2021-06-10 2021-06-15
1 Jim 200 2021-05-11 2021-06-15
2 Jim 150 2021-03-05 2021-06-15
3 Bob 350 2022-06-10 2022-08-30
4 Bob 300 2022-08-12 2022-08-30
5 Bob 400 2021-07-06 2022-08-30
Output:
Name Amount Date 2 MRA MRA2 MRA / MRA2 Less than or equal to .5
0 Bob 300 2022-08-30 300 350 0.857143 0
1 Jim 100 2021-06-15 100 200 0.500000 1
Explanation:
Date 1
<= Date 2
groupby()
and nlargest()
to get the 2 most recent Date 1
values per Name
assign()
to add back the Amount
and Date 2
columns and create a recency
column containing MRA
and MRA2
for the pair of rows corresponding to each Name
valuepivot()
to turn the recency
values MRA
and MRA2
into column labelsreset_index()
to restore Name
and Date 2
to columns, and use rename_axis()
to make the columns
index anonymousassign()
once to restore Amount
and add column MRA / MRA2
, and again to add column named Less than or equal to .5
concat()
, []
and drop()
to rearrange the columns to match the output sequence shown in the question.Upvotes: 1
Reputation: 156
Here's the rough procedure you want:
sort_values
by Name and Date 1 to get the data in order.shift
to get the previous date and 2nd most recent amount fieldsgroup_by
by Name and use head
to get only the first row.Now, your Amount column is your Most Recent Amount and your Shifted Amount column is the 2nd Most Recent amount. From there, you can do a simple division to get the ratio.
Upvotes: 0