stavrop
stavrop

Reputation: 495

Pandas: Create new columns with values from other rows

i am trying to do the following: lets assume the following df:

id, date, value
1, 2017-01-01, 0.245
1, 2017-01-02, 0.445
1, 2017-01-03, 0.235
...
1, 2017-02-01, 0.148
1, 2017-02-02, 0.985
1, 2017-02-03, 0.785
...
1, 2017-03-01, 0.258
1, 2017-03-02, 0.158
1, 2017-03-03, 0.157
...
1, 2017-04-01, 0.7758
1, 2017-04-02, 0.458
1, 2017-04-03, 0.357
...

I am trying to calculate a new value (lets name it 'outval') based on the values of the same day x-months before times a multiplier. (if a previous day is a weekend, use last Friday)

The following code seem to work, but it's a bit slow.

weight_vector = [1,2,3]
ds['outval'] = np.nan
for row in ds.itertuples():
    d_vals = [row.date - pd.DateOffset(months = i) for i in weight_vector]
    d_vals = [x - pd.DateOffset(days = 2) if x.dayofweek == 6 else x for x in d_vals]
    d_vals = [x - pd.DateOffset(days = 1) if x.dayofweek == 5 else x for x in d_vals]
    df = ds.loc[ds['date'].isin(d_vals)]
    if not df.empty and df.shape[0] == len(weight_vector):
        ds['outval'].loc[row.Index] = (weight_vector * df['value']).sum()

An idea was to add columns with the values from the same day x-months before i.e.

id, date, value,value-1, value-2, value-3
1, 2017-01-01, 0.245, nan, nan, nan
1, 2017-01-02, 0.445, nan, nan, nan
1, 2017-01-03, 0.235, nan, nan, nan
...
1, 2017-02-01, 0.148, 0.245, nan, nan
1, 2017-02-02, 0.985, 0.445, nan, nan
1, 2017-02-03, 0.785, 0.235, nan, nan
...
1, 2017-03-01, 0.258, 0.148, 0.245, nan
1, 2017-03-02, 0.158, 0.985, 0.445, nan
1, 2017-03-03, 0.157, 0.785, 0.235, nan
...
1, 2017-04-01, 0.7758, 0.258, 0.148, 0.245
1, 2017-04-02, 0.458, 0.158, 0.985, 0.445
1, 2017-04-03, 0.357, 0.157, 0.785, 0.235
...

and then just add/multiply the columns (or something similar)

ds['outval'] = ds['value-1'] * weight_vector[0] + ds['value-2'] * weight_vector[1] + ds['value-3'] * weight_vector[2] +

but i'm having trouble creating the new columns (shift won't work because there could be missing dates)

Any ideas on how to do that, or any other suggestions for improving performance?

Thanks!

Upvotes: 0

Views: 121

Answers (1)

Tural Gurbanov
Tural Gurbanov

Reputation: 742

You can do it several steps:

  1. Using apply create past-date columns which contains the right day (same day x-months before, not weekend). Let's call them a, b, and c. For example:

    from datetime import date
    df = [{"date": date(2018, 1, 2), "value": 1}, {"date": date(2018, 2, 5), "value": 3}]
    df = pd.DataFrame(df)
    
    from functools import partial
    
    def get_date(cur_date, n_month_back):
        dt = cur_date - pd.DateOffset(months=n_month_back)
        dt -= pd.DateOffset(days=max(0, dt.dayofweek - 4))
        return dt
    
    df["a"] = df["date"].apply(partial(get_date, n_month_back=1))
    
  2. Convert the date column into the index column

  3. Now you are able to find values (fast) associated with a, b, and c
  4. Use found values to calculate outval

Upvotes: 1

Related Questions