Pandas: Create new columns with values from other rows

Question

i am trying to do the following: lets assume the following df:

id, date, value
1, 2017-01-01, 0.245
1, 2017-01-02, 0.445
1, 2017-01-03, 0.235
...
1, 2017-02-01, 0.148
1, 2017-02-02, 0.985
1, 2017-02-03, 0.785
...
1, 2017-03-01, 0.258
1, 2017-03-02, 0.158
1, 2017-03-03, 0.157
...
1, 2017-04-01, 0.7758
1, 2017-04-02, 0.458
1, 2017-04-03, 0.357
...

I am trying to calculate a new value (lets name it 'outval') based on the values of the same day x-months before times a multiplier. (if a previous day is a weekend, use last Friday)

The following code seem to work, but it's a bit slow.

weight_vector = [1,2,3]
ds['outval'] = np.nan
for row in ds.itertuples():
    d_vals = [row.date - pd.DateOffset(months = i) for i in weight_vector]
    d_vals = [x - pd.DateOffset(days = 2) if x.dayofweek == 6 else x for x in d_vals]
    d_vals = [x - pd.DateOffset(days = 1) if x.dayofweek == 5 else x for x in d_vals]
    df = ds.loc[ds['date'].isin(d_vals)]
    if not df.empty and df.shape[0] == len(weight_vector):
        ds['outval'].loc[row.Index] = (weight_vector * df['value']).sum()

An idea was to add columns with the values from the same day x-months before i.e.

id, date, value,value-1, value-2, value-3
1, 2017-01-01, 0.245, nan, nan, nan
1, 2017-01-02, 0.445, nan, nan, nan
1, 2017-01-03, 0.235, nan, nan, nan
...
1, 2017-02-01, 0.148, 0.245, nan, nan
1, 2017-02-02, 0.985, 0.445, nan, nan
1, 2017-02-03, 0.785, 0.235, nan, nan
...
1, 2017-03-01, 0.258, 0.148, 0.245, nan
1, 2017-03-02, 0.158, 0.985, 0.445, nan
1, 2017-03-03, 0.157, 0.785, 0.235, nan
...
1, 2017-04-01, 0.7758, 0.258, 0.148, 0.245
1, 2017-04-02, 0.458, 0.158, 0.985, 0.445
1, 2017-04-03, 0.357, 0.157, 0.785, 0.235
...

and then just add/multiply the columns (or something similar)

ds['outval'] = ds['value-1'] * weight_vector[0] + ds['value-2'] * weight_vector[1] + ds['value-3'] * weight_vector[2] +

but i'm having trouble creating the new columns (shift won't work because there could be missing dates)

Any ideas on how to do that, or any other suggestions for improving performance?

Thanks!

Tural Gurbanov · Accepted Answer

You can do it several steps:

Using apply create past-date columns which contains the right day (same day x-months before, not weekend). Let's call them a, b, and c. For example:

from datetime import date
df = [{"date": date(2018, 1, 2), "value": 1}, {"date": date(2018, 2, 5), "value": 3}]
df = pd.DataFrame(df)

from functools import partial

def get_date(cur_date, n_month_back):
    dt = cur_date - pd.DateOffset(months=n_month_back)
    dt -= pd.DateOffset(days=max(0, dt.dayofweek - 4))
    return dt

df["a"] = df["date"].apply(partial(get_date, n_month_back=1))

Convert the date column into the index column
Now you are able to find values (fast) associated with a, b, and c
Use found values to calculate outval

Pandas: Create new columns with values from other rows

Answers (1)

Related Questions