Reputation: 515
I have a DataFrame td
consisting of the following columns:
In [111]: td.head(5)
Out[111]:
Date Time Price
0 2015-09-21 00:01:26 4303.00
1 2015-09-21 00:01:33 4303.00
2 2015-09-21 00:02:21 4303.50
3 2015-09-21 00:02:21 4303.50
4 2015-09-21 00:02:31 4303.25
My goal is to have a Series with Datetime and Price.
I tried:
s = pd.Series(td['Price'], index=pd.to_datetime(td['Date'] + ' ' + td['Time']))
But get the result:
>>> s
2015-09-21 00:01:26 NaN
2015-09-21 00:01:33 NaN
2015-09-21 00:02:21 NaN
2015-09-21 00:02:21 NaN
..
2015-09-25 16:59:58 NaN
2015-09-25 16:59:58 NaN
2015-09-25 16:59:58 NaN
2015-09-25 16:59:59 NaN
Name: Price, dtype: float64
All the values from "Prices" are NaN. Any hint what I am doing wrong?
Upvotes: 4
Views: 388
Reputation: 176850
When creating a Series from a DataFrame column and passing in an index, the column will be reindexed according to the new index.
In your case, none of the labels in the newly created Datetime index were originally used to index the column td['Price']
, so a Series of missing (NaN
) values is returned.
The easiest solution is to pass in td['Price'].values
instead:
>>> pd.Series(td['Price'].values, index=pd.to_datetime(td['Date']+' '+td['Time'])
2015-09-21 00:01:26 4303.00
2015-09-21 00:01:33 4303.00
2015-09-21 00:02:21 4303.50
2015-09-21 00:02:21 4303.50
2015-09-21 00:02:31 4303.25
...
Using td['Price'].values
means that the values from the column are in a NumPy array: this has no index and pandas does not try to reindex the values.
Upvotes: 2