Reputation:
In pandas, I'm creating a dataframe like:
df = pd.read_csv(file_path)[['timestamp', 'close']]
df['close'] = df['close'].astype(float)
df = df.set_index('timestamp')
The data looks like:
close
timestamp
2019-04-18 203.86
2019-04-17 203.13
2019-04-16 199.25
2019-04-15 199.23
2019-04-12 198.87
Now, I would like to fill in the missing timestamp
and close
values with linear interpolation from the nearest neighbors.
I created a list of the missing dates with:
dates = pd.date_range(start=df['timestamp'].min(), end=df['timestamp'].max())
Then reindexed with:
df = df.reindex(dates).iloc[::-1]
But this produced:
close
timestamp
2019-04-18 NaN
2019-04-17 NaN
2019-04-16 NaN
2019-04-15 NaN
2019-04-14 NaN
I was hoping this would at least copy over the previous values (although, I've yet to find a good way to get the missing close value interpolation handled smoothly.) How would I phrase this in pandas?
Upvotes: 0
Views: 603
Reputation: 153460
Try:
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.set_index('timestamp')
df.resample('D').interpolate(method='index')
Output:
close
timestamp
2019-04-12 198.87
2019-04-13 198.99
2019-04-14 199.11
2019-04-15 199.23
2019-04-16 199.25
2019-04-17 203.13
2019-04-18 203.86
Upvotes: 1
Reputation: 323286
Your index should not be datetime format , if you just using read_csv and does not pass parse_dates
df = df.set_index('timestamp')
df.index=pd.to_datetime(df.index)
After convert it , you should be fine with reindex
Another solution will be
df = pd.read_csv(file_path,parse_dates = 'timestamp')[['timestamp', 'close']]
After we finish convert the datetime,
We using interpolate
fillna
df.loc[dates[::-1]].interpolate('index')
Upvotes: 1
Reputation: 150745
Try this:
df.reindex(dates).align(df)[1]
Output:
+-------------+--------+
| | close |
+-------------+--------+
| 2019-04-12 | 198.87 |
| 2019-04-13 | NaN |
| 2019-04-14 | NaN |
| 2019-04-15 | 199.23 |
| 2019-04-16 | 199.25 |
| 2019-04-17 | 203.13 |
| 2019-04-18 | 203.86 |
+-------------+--------+
Upvotes: 0