Noel Harris
Noel Harris

Reputation: 131

How to calculate the difference of a value for multiple dates?

My data set contains multiple values for gas storage. Each of them I want to compare to the value at the exact date one year ago, for multiple years. This is how my data looks like:

facility gasDayStartedOn gasInStorage full injection
UGS Haidach 2022-01-09 4.3041 37 0.00
UGS Haidach 2022-01-08 4.3263 38 0.00
UGS Haidach 2021-01-09 5.5678 43 0.00

How can I calculate/compare gasInStorgae for each year gasDayStartedOn for the same facility and store it in a new column within the same DataFrame?

I have wrote this code:

def det_dates(df, a_date):
    b_df = df[df.gasDayStartedOn == a_date - pd.Timedelta(days=365)]
    if b_df.shape[0] != 0:
        return b_df.full.values[0]
    return None

def get_dif(df):
    for i, r in df.iterrows():
        a_date = r.gasDayStartedOn
        a_gasInStorage = r.gasInStorage
        b_gasInStorage = det_dates(df, a_date)
        
        if b_gasInStorage:
            dif_gasInStorage = a_gasInStorage - gasInStorage
        else:
            dif_gasInStorage = None
            
        df.loc[i, 'difdif'] = dif_gasInStorage

dfs = []

for com_fac, group in tqdm(data_1.groupby(['company', 'facility'])):
    g = group.copy()
    g.sort_values('gasDayStartedOn', inplace=True, ascending=False)
    get_dif(g)
    dfs.append(g)

But its not working! PLS help! This is the error I get:

from datetime import datetime, timedelta

Upvotes: 0

Views: 100

Answers (1)

Michael Gardner
Michael Gardner

Reputation: 1813

You'll get a better answer if you can provide the expected output. But a simple way of checking the difference between one year from the next on the same day is using groupby and diff.

import pandas as pd
df = pd.read_clipboard()

df['gasDayStartedOn'] = pd.to_datetime(df.gasDayStartedOn)
df = df.sort_values(by='gasDayStartedOn', ascending=True)

group = df.groupby([df.gasDayStartedOn.dt.day, df.gasDayStartedOn.dt.month, 'facility'])

df['diff'] = group['gasInStorage'].diff()

df
Out[1]: 
      facility gasDayStartedOn  gasInStorage  full  injection    diff
2  UGS Haidach      2021-01-09        5.5678    43        0.0     NaN
1  UGS Haidach      2022-01-08        4.3263    38        0.0     NaN
0  UGS Haidach      2022-01-09        4.3041    37        0.0 -1.2637

Upvotes: 1

Related Questions