coderguy
coderguy

Reputation: 47

Very simple pandas column/row transform that I cannot figure out

I need to do a simple calculation on values in a dataframe, but I need some column transposed first. Once they are transposed I want to take the most recent amount / 2nd most recent amount and then the binary result if it less than or equal to .5

By most recent I mean most recent to the date in the Date 2 column

Have This

| Name | Amount |  Date 1                | Date 2     |
| -----| ----   |------------------------|------------|
| Jim  | 100    |   2021-06-10           | 2021-06-15 |
| Jim  | 200    |   2021-05-11           | 2021-06-15 |
| Jim  | 150    |   2021-03-5            | 2021-06-15 |
| Bob  | 350    |   2022-06-10           | 2022-08-30 |
| Bob  | 300    |   2022-08-12           | 2022-08-30 |
| Bob  | 400    |   2021-07-6            | 2022-08-30 |

I Want this
| Name | Amount | Date 2| Most Recent Amount(MRA) | 2nd Most Recent Amount(2MRA) | MRA / 2MRA| Less than or equal to .5 |
| -----| -------|------------------------|----------------|--------------------|-------------|--------------------------|
| Jim  | 100    |   2021-06-15           |       100      |       200          | .5          | 1                        |
| Bob  | 300    |   2022-08-30           |       300      |       350          | .85         | 0                        |

Upvotes: 0

Views: 68

Answers (3)

Behzad Aslani Avilaq
Behzad Aslani Avilaq

Reputation: 286

This is the original dataframe.

df = pd.DataFrame({'Name':['Jim','Jim','Jim','Bob','Bob','Bob'],
               'Amount':[100,200,150,350,300,400],
               'Date 1':['2021-06-10','2021-05-11','2021-03-05','2022-06-10','2022-08-12','2021-07-06'],
               'Date 2':['2021-06-15','2021-06-15','2021-06-15','2022-08-30','2022-08-30','2022-08-30']
              })

And this is the results.

# here we take the gropby of the 'Name' column
g = df.sort_values('Date 1', ascending=False).groupby(['Name'])

# then we use the agg function to get the first of 'Date 2' and 'Amount' columns
# and then rename result of the 'Amount' column to 'MRA' 
first = g.agg({'Date 2':'first','Amount':'first'}).rename(columns={'Amount':'MRA'}).reset_index()
# Similarly, we take the second values by applying a lambda function
second = g.agg({'Date 2':'first','Amount':lambda t: t.iloc[1]}).rename(columns={'Amount':'2MRA'}).reset_index()
df_T = pd.merge(first, second, on=['Name','Date 2'], how='left')

# then we use this function to add two desired columns
def operator(x):
    return x['MRA']/x['2MRA'], [1 if x['MRA']/x['2MRA']<=.5 else 0][0]

# we apply the operator function to add 'MRA/2MRA' and 'Less than or equal to .5' columns
df_T['MRA/2MRA'], df_T['Less than or equal to .5'] = zip(*df_T.apply(operator, axis=1))

Hope this helps. :)

Upvotes: 2

constantstranger
constantstranger

Reputation: 9379

One way to do what you've asked is:

df = ( df[df['Date 1'] <= df['Date 2']]
        .groupby('Name', sort=False)['Date 1'].nlargest(2)
        .reset_index(level=0)
        .assign(**{
            'Amount': df.Amount, 
            'Date 2': df['Date 2'],
            'recency': ['MRA','MRA2']*len(set(df.Name.tolist()))
        })
        .pivot(index=['Name','Date 2'], columns='recency', values='Amount')
        .reset_index().rename_axis(columns=None) )
df = df.assign(**{'Amount':df.MRA, 'MRA / MRA2': df.MRA/df.MRA2})
df = df.assign(**{'Less than or equal to .5': (df['MRA / MRA2'] <= 0.5).astype(int)})
df = pd.concat([df[['Name', 'Amount']], df.drop(columns=['Name', 'Amount'])], axis=1)

Input:

  Name  Amount     Date 1     Date 2
0  Jim     100 2021-06-10 2021-06-15
1  Jim     200 2021-05-11 2021-06-15
2  Jim     150 2021-03-05 2021-06-15
3  Bob     350 2022-06-10 2022-08-30
4  Bob     300 2022-08-12 2022-08-30
5  Bob     400 2021-07-06 2022-08-30

Output:

  Name  Amount     Date 2  MRA  MRA2  MRA / MRA2  Less than or equal to .5
0  Bob     300 2022-08-30  300   350    0.857143                         0
1  Jim     100 2021-06-15  100   200    0.500000                         1

Explanation:

  • Filter only for rows where Date 1 <= Date 2
  • Use groupby() and nlargest() to get the 2 most recent Date 1 values per Name
  • Use assign() to add back the Amount and Date 2 columns and create a recency column containing MRA and MRA2 for the pair of rows corresponding to each Name value
  • Use pivot() to turn the recency values MRA and MRA2 into column labels
  • Use reset_index() to restore Name and Date 2 to columns, and use rename_axis() to make the columns index anonymous
  • Use assign() once to restore Amount and add column MRA / MRA2, and again to add column named Less than or equal to .5
  • Use concat(), [] and drop() to rearrange the columns to match the output sequence shown in the question.

Upvotes: 1

David_Leber
David_Leber

Reputation: 156

Here's the rough procedure you want:

  1. sort_values by Name and Date 1 to get the data in order.
  2. shift to get the previous date and 2nd most recent amount fields
  3. Filter the dataframe for Date 1 <= Date 2.
  4. group_by by Name and use head to get only the first row.

Now, your Amount column is your Most Recent Amount and your Shifted Amount column is the 2nd Most Recent amount. From there, you can do a simple division to get the ratio.

Upvotes: 0

Related Questions