Reputation: 131
Given a timeseries, s
, with a datetime index I expected to be able to index the timeseries by the date string. Am I misunderstanding how this should work?
import pandas as pd
url = 'http://ichart.finance.yahoo.com/table.csvs=SPY&d=12&e=4&f=2012&g=d&a=01&b=01&c=2001&ignore=.csv'
df = pd.read_csv(url, index_col='Date', parse_dates=True)
s = df['Close']
s['2012-12-04']
Result:
TimeSeriesError Traceback (most recent call last)
<ipython-input-244-e2ccd4ecce94> in <module>()
2 df = pd.read_csv(url, index_col='Date', parse_dates=True)
3 s = df['Close']
----> 4 s['2012-12-04']
G:\Python27-32\lib\site-packages\pandas\core\series.pyc in __getitem__(self, key)
468 def __getitem__(self, key):
469 try:
--> 470 return self.index.get_value(self, key)
471 except InvalidIndexError:
472 pass
G:\Python27-32\lib\site-packages\pandas\tseries\index.pyc in get_value(self, series, key)
1030
1031 try:
-> 1032 loc = self._get_string_slice(key)
1033 return series[loc]
1034 except (TypeError, ValueError, KeyError):
G:\Python27-32\lib\site-packages\pandas\tseries\index.pyc in _get_string_slice(self, key)
1077 asdt, parsed, reso = parse_time_string(key, freq)
1078 key = asdt
-> 1079 loc = self._partial_date_slice(reso, parsed)
1080 return loc
1081
G:\Python27-32\lib\site-packages\pandas\tseries\index.pyc in _partial_date_slice(self, reso, parsed)
992 def _partial_date_slice(self, reso, parsed):
993 if not self.is_monotonic:
--> 994 raise TimeSeriesError('Partial indexing only valid for ordered '
995 'time series.')
996
TimeSeriesError: Partial indexing only valid for ordered time series.
To be more specific (and perhaps pedantic..), what's the difference between the 2 Timeseries here:
import pandas as pd
url = 'http://ichart.finance.yahoo.com/table.csv? s=SPY&d=12&e=4&f=2012&g=d&a=01&b=01&c=2001&ignore=.csv'
s = pd.read_csv(url, index_col='Date', parse_dates=True)['Close']
rng = date_range(start='2011-01-01', end='2011-12-31')
ts = Series(randn(len(rng)), index=rng)
print ts.__class__
print ts.index[0].__class__
print s1.__class__
print s1.index[0].__class__
print ts[ts.index[0]]
print s[s.index[0]]
print ts['2011-01-01']
try:
print s['2012-12-05']
except:
print "doesn't work"
Result:
<class 'pandas.core.series.TimeSeries'>
<class 'pandas.lib.Timestamp'>
<class 'pandas.core.series.TimeSeries'>
<class 'pandas.lib.Timestamp'>
-0.608673793503
141.5
-0.608673793503
doesn't work
Upvotes: 3
Views: 4354
Reputation: 179
While the pandas tutorial was instructive, I think the original question posed deserves a direct answer. I ran into the same problem converting Yahoo chart info to a DataFrame that could be sliced, etc. I found that the only thing that was required was:
import pandas as pd
import datetime as dt
def dt_parser(date):
return dt.datetime.strptime(date, '%Y-%m-%d') + dt.timedelta(hours=16)
url = 'http://ichart.finance.yahoo.com/table.csvs=SPY&d=12&e=4&f=2012&g=d&a=01&b=01&c=2001&ignore=.csv'
df = pd.read_csv(url, index_col=0, parse_dates=True, date_parser=dt_parser)
df.sort_index(inplace=True)
s = df['Close']
s['2012-12-04'] # now should work
The "trick" was to include my own date_parser. I'm guessing that there is some better way to do this within read_csv, but this at least produced a DataFrame that was indexed and could be sliced.
Upvotes: 0
Reputation: 375377
When the time series is not ordered and you give a partial timestamp (e.g. a date, rather than a datetime) it's not clear which datetime should be selected.
It can't be assumed that there is only one datetime object per date, although there are in this example, here there are several options but it seems safer to throw an error here rather than guess a users motives. (We could return a series/list similar to .ix['2011-01']
, but this may be confusing if returning a number in other cases. We could try to return a "closest match"... but this doesn't really make sense either.)
In an ordered case it's easier, we pick the first datetime with the selected date.
You can see in this behaviour in this simple example:
import pandas as pd
from numpy.random import randn
from random import shuffle
rng = pd.date_range(start='2011-01-01', end='2011-12-31')
rng2 = list(rng)
shuffle(rng2) # not in order
rng3 = list(rng)
del rng3[20] # in order, but no freq
ts = pd.Series(randn(len(rng)), index=rng)
ts2 = pd.Series(randn(len(rng)), index=rng2)
ts3 = pd.Series(randn(len(rng)-1), index=rng3)
ts.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, ..., 2011-12-31 00:00:00]
Length: 365, Freq: D, Timezone: None
ts['2011-01-01']
# -1.1454418070543406
ts2.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-04-16 00:00:00, ..., 2011-03-10 00:00:00]
Length: 365, Freq: None, Timezone: None
ts2['2011-01-01']
#...error which you describe
TimeSeriesError: Partial indexing only valid for ordered time series
ts3.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, ..., 2011-12-31 00:00:00]
Length: 364, Freq: None, Timezone: None
ts3['2011-01-01']
1.7631554507355987
rng4 = pd.date_range(start='2011-01-01', end='2011-01-31', freq='H')
ts4 = pd.Series(randn(len(rng4)), index=rng4)
ts4['2011-01-01'] == ts4[0]
# True # it picks the first element with that date
I don't think this is a bug, nevertheless I posted it as an issue on github.
Upvotes: 1
Reputation: 6561
Try indexing with a Timestamp
object:
>>> import pandas as pd
>>> from pandas.lib import Timestamp
>>> url = 'http://ichart.finance.yahoo.com/table.csv?s=SPY&d=12&e=4&f=2012&g=d&a=01&b=01&c=2001&ignore=.csv'
>>> df = pd.read_csv(url, index_col='Date', parse_dates=True)
>>> s = df['Close']
>>> s[Timestamp('2012-12-04')]
141.25
Upvotes: 2