Reputation: 131
My data set contains multiple values for gas storage. Each of them I want to compare to the value at the exact date one year ago, for multiple years. This is how my data looks like:
facility | gasDayStartedOn | gasInStorage | full | injection |
---|---|---|---|---|
UGS Haidach | 2022-01-09 | 4.3041 | 37 | 0.00 |
UGS Haidach | 2022-01-08 | 4.3263 | 38 | 0.00 |
UGS Haidach | 2021-01-09 | 5.5678 | 43 | 0.00 |
How can I calculate/compare gasInStorgae
for each year gasDayStartedOn
for the same facility and store it in a new column within the same DataFrame
?
I have wrote this code
:
def det_dates(df, a_date):
b_df = df[df.gasDayStartedOn == a_date - pd.Timedelta(days=365)]
if b_df.shape[0] != 0:
return b_df.full.values[0]
return None
def get_dif(df):
for i, r in df.iterrows():
a_date = r.gasDayStartedOn
a_gasInStorage = r.gasInStorage
b_gasInStorage = det_dates(df, a_date)
if b_gasInStorage:
dif_gasInStorage = a_gasInStorage - gasInStorage
else:
dif_gasInStorage = None
df.loc[i, 'difdif'] = dif_gasInStorage
dfs = []
for com_fac, group in tqdm(data_1.groupby(['company', 'facility'])):
g = group.copy()
g.sort_values('gasDayStartedOn', inplace=True, ascending=False)
get_dif(g)
dfs.append(g)
But its not working! PLS help! This is the error I get:
from datetime import datetime, timedelta
Upvotes: 0
Views: 100
Reputation: 1813
You'll get a better answer if you can provide the expected output. But a simple way of checking the difference between one year from the next on the same day is using groupby
and diff
.
import pandas as pd
df = pd.read_clipboard()
df['gasDayStartedOn'] = pd.to_datetime(df.gasDayStartedOn)
df = df.sort_values(by='gasDayStartedOn', ascending=True)
group = df.groupby([df.gasDayStartedOn.dt.day, df.gasDayStartedOn.dt.month, 'facility'])
df['diff'] = group['gasInStorage'].diff()
df
Out[1]:
facility gasDayStartedOn gasInStorage full injection diff
2 UGS Haidach 2021-01-09 5.5678 43 0.0 NaN
1 UGS Haidach 2022-01-08 4.3263 38 0.0 NaN
0 UGS Haidach 2022-01-09 4.3041 37 0.0 -1.2637
Upvotes: 1