LeCoconutWhisperer
LeCoconutWhisperer

Reputation: 319

Perform shift and other operations on a pandas dataframe

Here is a sample of my data

import pandas as pd

dic = {'Drug': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'], 
       'Date': ['01-01-20', '01-02-20', '01-03-20', '01-04-20', '01-05-20', '01-10-20', '01-15-20', '01-20-20', '01-21-20', '01-01-20', '01-02-20', '01-03-20', '01-04-20', '01-05-20'],
       'Amount': [10, 20, 30, 40, 50,60, 70, 80, 90, 10, 20, 30, 40, 50]}
df = pd.DataFrame(dic)
| Drug | Date     | Amount |
| ---- | -------- | ------ |
| A    | 01-01-20 | 10     |
|      | 01-02-20 | 20     |
|      | 01-03-20 | 30     |
|      | 01-04-20 | 40     |
|      | 01-05-20 | 50     |
|      | 01-10-20 | 60     |
|      | 01-15-20 | 70     |
|      | 01-20-20 | 80     |
|      | 01-21-20 | 90     |
| B    | 01-01-20 | 10     |
|      | 01-02-20 | 20     |
|      | 01-03-20 | 30     |
|      | 01-04-20 | 40     |
|      | 01-05-20 | 50     |

I have performed a groupby on Drug and want to apply a lambda function that calculates 3 metrics -

  1. Lag -> The amount for a drug x days ago
  2. Trend -> The difference between the amount for a drug today and the amount x days ago
  3. Window -> Mean of the amounts for a drug between today and x days ago (Days not seen in the dataframe are assumed to have the same value as the day that comes before them in the data ie, Jan 6th 2020 has the same value as Jan 5th 2020. Days in 2019 are considered to have the same value as on Jan 1st 2020)

Here is my desired output for the case where x=7 -

| Drug | Date     | Amount | Date 7 Days Ago | Lag | Trend | Window |
| ---- | -------- | ------ | --------------- | --- | ----- | ------ |
| A    | 01-01-20 | 10     | 12-26-19        | 10  | 0     | 10.00  |
|      | 01-02-20 | 20     | 12-27-19        | 10  | 10    | 11.43  |
|      | 01-03-20 | 30     | 12-28-19        | 10  | 20    | 14.29  |
|      | 01-04-20 | 40     | 12-29-19        | 10  | 30    | 18.57  |
|      | 01-05-20 | 50     | 12-30-19        | 10  | 40    | 24.29  |
|      | 01-10-20 | 60     | 01-04-20        | 40  | 20    | 50.00  |
|      | 01-15-20 | 70     | 01-09-20        | 50  | 20    | 60.00  |
|      | 01-20-20 | 80     | 01-14-20        | 60  | 20    | 70.00  |
|      | 01-21-20 | 90     | 01-15-20        | 70  | 20    | 74.29  |
| B    | 01-01-20 | 10     | 12-26-19        | 10  | 0     | 10.00  |
|      | 01-02-20 | 20     | 12-27-19        | 10  | 10    | 11.43  |
|      | 01-03-20 | 30     | 12-28-19        | 10  | 20    | 24.29  |
|      | 01-04-20 | 40     | 12-29-19        | 10  | 30    | 18.57  |
|      | 01-05-20 | 50     | 12-30-19        | 10  | 40    | 24.29  |

I have performed the above using for loops but I want to use a more Pandas way of doing this which I am unable to figure out.

Upvotes: 0

Views: 45

Answers (1)

betacrash
betacrash

Reputation: 59

First get your df in order, then try df.shift(...).

Upvotes: 1

Related Questions