user479947
user479947

Reputation:

Reindexing missing dates in pandas but receiving NaN values

In pandas, I'm creating a dataframe like:

    df = pd.read_csv(file_path)[['timestamp', 'close']]
    df['close'] = df['close'].astype(float)
    df = df.set_index('timestamp')

The data looks like:

                    close
timestamp                 
2019-04-18          203.86
2019-04-17          203.13
2019-04-16          199.25
2019-04-15          199.23
2019-04-12          198.87

Now, I would like to fill in the missing timestamp and close values with linear interpolation from the nearest neighbors.

I created a list of the missing dates with:

dates = pd.date_range(start=df['timestamp'].min(), end=df['timestamp'].max())

Then reindexed with:

df = df.reindex(dates).iloc[::-1]

But this produced:

                      close
timestamp
2019-04-18             NaN
2019-04-17             NaN
2019-04-16             NaN
2019-04-15             NaN
2019-04-14             NaN

I was hoping this would at least copy over the previous values (although, I've yet to find a good way to get the missing close value interpolation handled smoothly.) How would I phrase this in pandas?

Upvotes: 0

Views: 603

Answers (3)

Scott Boston
Scott Boston

Reputation: 153460

Try:

df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.set_index('timestamp')
df.resample('D').interpolate(method='index')

Output:

             close
timestamp         
2019-04-12  198.87
2019-04-13  198.99
2019-04-14  199.11
2019-04-15  199.23
2019-04-16  199.25
2019-04-17  203.13
2019-04-18  203.86

Upvotes: 1

BENY
BENY

Reputation: 323286

Your index should not be datetime format , if you just using read_csv and does not pass parse_dates

df = df.set_index('timestamp')
df.index=pd.to_datetime(df.index)

After convert it , you should be fine with reindex


Another solution will be

df = pd.read_csv(file_path,parse_dates = 'timestamp')[['timestamp', 'close']]

After we finish convert the datetime,

We using interpolate fillna

df.loc[dates[::-1]].interpolate('index')

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150745

Try this:

df.reindex(dates).align(df)[1]

Output:

+-------------+--------+
|             | close  |
+-------------+--------+
| 2019-04-12  | 198.87 |
| 2019-04-13  | NaN    |
| 2019-04-14  | NaN    |
| 2019-04-15  | 199.23 |
| 2019-04-16  | 199.25 |
| 2019-04-17  | 203.13 |
| 2019-04-18  | 203.86 |
+-------------+--------+

Upvotes: 0

Related Questions