Danish
Danish

Reputation: 2871

Create new column in data frame by interpolating other column in between a particular date range - Pandas

I have a df as shown below.

the data is like this.

     Date        y
0   2020-06-14  127
1   2020-06-15  216
2   2020-06-16  4
3   2020-06-17  90
4   2020-06-18  82
5   2020-06-19  70
6   2020-06-20  59
7   2020-06-21  48
8   2020-06-22  23
9   2020-06-23  25
10  2020-06-24  24
11  2020-06-25  22
12  2020-06-26  19
13  2020-06-27  10
14  2020-06-28  18
15  2020-06-29  157
16  2020-06-30  16
17  2020-07-01  14
18  2020-07-02  343

The code to create the data frame.

# Create a dummy dataframe
import pandas as pd
import numpy as np
y0 = [127,216,4,90, 82,70,59,48,23,25,24,22,19,10,18,157,16,14,343]
def initial_forecast(data):
    data['y'] = y0
    return data
# Initial date dataframe
df_dummy = pd.DataFrame({'Date': pd.date_range('2020-06-14', periods=19, freq='1D')})
# Dates
start_date = df_dummy.Date.iloc[1]
print(start_date)
end_date = df_dummy.Date.iloc[17]
print(end_date)
# Adding y0 in the dataframe
df_dummy = initial_forecast(df_dummy)
df_dummy

From the above I would like to interpolate the data for a particular date range.

I would like to interpolate(linear) between 2020-06-17 to 2020-06-27.

ie from 2020-06-17 to 2020-06-27 'y' values changes from 90 to 10 in 10 steps. so at an average in each step it reduces 8.

ie (90-10)/10(number of steps) = 8 in each steps

The expected output:

     Date        y       y_new
0   2020-06-14  127      127
1   2020-06-15  216      216
2   2020-06-16  4        4
3   2020-06-17  90       90
4   2020-06-18  82       82
5   2020-06-19  70       74
6   2020-06-20  59       66
7   2020-06-21  48       58
8   2020-06-22  23       50
9   2020-06-23  25       42
10  2020-06-24  24       34
11  2020-06-25  22       26
12  2020-06-26  19       18  
13  2020-06-27  10       10
14  2020-06-28  18       18
15  2020-06-29  157      157
16  2020-06-30  16       16
17  2020-07-01  14       14
18  2020-07-02  343      343

Note: In the remaining date range y_new value should be same as y value.

I tried below code, that is not giving desired output

# Function
def df_interpolate(df, start_date, end_date): 
    df["Date"]=pd.to_datetime(df["Date"])
    df.loc[(df['Date'] >= start_date) & (df['Date'] <= end_date), 'y_new'] = np.nan
    df['y_new'] = df['y'].interpolate().round()
    return df
df1 = df_interpolate(df_dummy, '2020-06-17', '2020-06-27')

Upvotes: 1

Views: 42

Answers (1)

ScootCork
ScootCork

Reputation: 3676

With some tweaks to your function it works. np.where to create the new column, removing the = from your conditionals, and casting to int as per your expected output.

def df_interpolate(df, start_date, end_date): 
    df["Date"] = pd.to_datetime(df["Date"])
    df['y_new'] = np.where((df['Date'] > start_date) & (df['Date'] < end_date), np.nan, df['y'])
    df['y_new'] = df['y_new'].interpolate().round().astype(int)
    return df

         Date    y  y_new
0  2020-06-14  127    127
1  2020-06-15  216    216
2  2020-06-16    4      4
3  2020-06-17   90     90
4  2020-06-18   82     82
5  2020-06-19   70     74
6  2020-06-20   59     66
7  2020-06-21   48     58
8  2020-06-22   23     50
9  2020-06-23   25     42
10 2020-06-24   24     34
11 2020-06-25   22     26
12 2020-06-26   19     18
13 2020-06-27   10     10
14 2020-06-28   18     18
15 2020-06-29  157    157
16 2020-06-30   16     16
17 2020-07-01   14     14
18 2020-07-02  343    343

Upvotes: 2

Related Questions