Reputation: 131

How to complement missing data in a time series using python?

I have a dataframe which has price for some days of the year, now I want to make a bigger datframe that shows all the days from the start of the year to some specific date. Then use the price for days that I already have in my original dataframe, and fill in between the days that have no price with the last price from that date.

as an example:

df = pd.DataFrame({
    'timestamps': pd.to_datetime(
        ['2021-01-04', '2021-01-07', '2021-01-14', '2021-01-21', '2021-01-28', '2021-01-29', 
'2021-02-04', '2021-02-12', '2021-02-18', '2021-02-25']),
    'LastPrice':['113.4377','115.0741','115.5709','116.5197','116.681','116.4198','117.5749','117.2175',
 '117.0541','117.5977']})

I want my new date series be like this

index=pd.date_range('2021-01-01', '2021-02-28')

dfObj = pd.DataFrame(columns=['new_Date','new_LastPrice'])
dfObj['new_Date'] = index

so, ideally I should have something like the following dataframe.(just the top part)

    new_Date    new_LastPrice
0   2021-01-01  0
1   2021-01-02  0
2   2021-01-03  0
3   2021-01-04  113.4377
4   2021-01-05  113.4377
5   2021-01-06  113.4377
6   2021-01-07  115.0741
7   2021-01-08  115.0741
8   2021-01-09  115.0741
9   2021-01-10  115.0741
10  2021-01-11  115.0741
11  2021-01-12  115.0741
12  2021-01-13  115.0741

Can anyone here help me with this, please?

Upvotes: 3

Answers (3)

sammywemmy

Reputation: 28729

You could use the complete function from pyjanitor to abstract the process of exposing the missing values/rows :

#pip install pyjanitor
import janitor
import pandas as pd
index=pd.date_range('2021-01-01', '2021-02-28')
# assign the new values as a dictionary,
# with the column name as the key
new_dates = {"timestamps": index} # accepts a callable too
(df.complete([new_dates])
   .ffill()
   .fillna(0)
   .set_axis(['new_Date', 'new_LastPrice'],
             axis = 'columns')
   .head(10) # shows the first 10 rows, you can get rid of this line
 )

    new_Date new_LastPrice
0 2021-01-01             0
1 2021-01-02             0
2 2021-01-03             0
3 2021-01-04      113.4377
4 2021-01-05      113.4377
5 2021-01-06      113.4377
6 2021-01-07      115.0741
7 2021-01-08      115.0741
8 2021-01-09      115.0741
9 2021-01-10      115.0741

Upvotes: 1

Ashutosh

Reputation: 159

This will work for your case: ( merged the dataframes and used fillna as ffill to fill missing values followed by filna as 0 for initial records )

df = pd.DataFrame({
    'timestamps': pd.to_datetime(
        ['2021-01-04', '2021-01-07', '2021-01-14', '2021-01-21', '2021-01-28', '2021-01-29', 
'2021-02-04', '2021-02-12', '2021-02-18', '2021-02-25']),
    'LastPrice':['113.4377','115.0741','115.5709','116.5197','116.681','116.4198','117.5749','117.2175',
 '117.0541','117.5977']})

index=pd.date_range('2021-01-01', '2021-02-28')


dfObj = pd.DataFrame(columns=['new_Date','new_LastPrice'])
dfObj['new_Date'] = index
dfObj = dfObj.merge(df,how='left', left_on='new_Date', right_on='timestamps')
dfObj = dfObj[['new_Date', 'LastPrice']]
dfObj = dfObj.fillna(method='ffill')
dfObj = dfObj.fillna(0)

Output:

new_Date    LastPrice
0   2021-01-01  0
1   2021-01-02  0
2   2021-01-03  0
3   2021-01-04  113.4377
4   2021-01-05  113.4377
5   2021-01-06  113.4377
6   2021-01-07  115.0741
7   2021-01-08  115.0741
8   2021-01-09  115.0741
9   2021-01-10  115.0741
10  2021-01-11  115.0741
...

Upvotes: 1

jezrael

Reputation: 863541

Use DataFrame.reindex with method='ffill':

index=pd.date_range('2021-01-01', '2021-02-28')

dfObj = (df.set_index('timestamps')
           .reindex(index, method='ffill')
           .fillna(0)
           .add_prefix('new_')
           .rename_axis('new_Date')
           .reset_index())
print (dfObj.head(13))
     new_Date new_LastPrice
0  2021-01-01             0
1  2021-01-02             0
2  2021-01-03             0
3  2021-01-04      113.4377
4  2021-01-05      113.4377
5  2021-01-06      113.4377
6  2021-01-07      115.0741
7  2021-01-08      115.0741
8  2021-01-09      115.0741
9  2021-01-10      115.0741
10 2021-01-11      115.0741
11 2021-01-12      115.0741
12 2021-01-13      115.0741

Upvotes: 3

How to complement missing data in a time series using python?

Answers (3)

Related Questions