Reputation: 437
My multi-indexed data frame is as follows:
df.head()
Output
Unit Timestamp
1 2016-06-01 00:00:00 225894.9
2016-06-01 01:00:00 225895.9
2016-06-01 02:00:00 225896.9
2016-06-01 03:00:00 225897.9
2016-06-01 04:00:00 225898.9
df.tail()
Output
Unit Timestamp
16 2016-06-30 18:00:00 150543.1
2016-06-30 19:00:00 150544.1
2016-06-30 21:00:00 150546.1
2016-06-30 22:00:00 150547.1
2016-06-30 23:00:00 150548.1
That is, one month's worth of hourly data for 16 units.
I want to pick out one day's data for a given unit. I have an array of the days that appear in the dataframe,
days = array([datetime.date(2016, 6, 1), datetime.date(2016, 6, 2),
datetime.date(2016, 6, 3), datetime.date(2016, 6, 4),...
etc etc
If I do df.ix[5,'2016-06-10']
, I get exactly what I want, that is the day's for unit 5 that date
Output
Timestamp
2016-06-10 00:00:00 152364.6
2016-06-10 01:00:00 152365.7
2016-06-10 02:00:00 152366.6
...
2016-06-10 21:00:00 152386.6
2016-06-10 22:00:00 152386.6
2016-06-10 23:00:00 152387.6
But the same indexing doesn't seem to work when I have a datetime.date object instead of a string. For example:
tenth = days[9]
evaulates to datetime.date(2016, 6, 10)
, so far so good.
However when I do df.ix[5,tenth]
I get "KeyError: u'no item named 2016-06-10'"
So, the '2016-06-10'
"does what I mean" and returns all the data on that day. How can I do the same if with a datetime.date?
Upvotes: 1
Views: 1481
Reputation: 294198
When you pass a string that looks like a datetime to the pandas selector ix
, it uses it like a condition and returns all elements that satisfy. In this case, the string you are using evaluates to a day. Pandas runs ix
and returns all rows within that day. When you pass the datetime object, it looks for an exact match. You don't have one since all your datetimes are non-zero hour times.
Consider:
s = pd.Series(range(5), pd.date_range('2016-03-31 01:00:00', periods=5, freq='H'))
s
2016-03-31 01:00:00 0
2016-03-31 02:00:00 1
2016-03-31 03:00:00 2
2016-03-31 04:00:00 3
2016-03-31 05:00:00 4
Freq: H, dtype: int64
get all elements on '2016-03-31'
s.ix['2016-03-31']
2016-03-31 01:00:00 0
2016-03-31 02:00:00 1
2016-03-31 03:00:00 2
2016-03-31 04:00:00 3
2016-03-31 05:00:00 4
Freq: H, dtype: int64
Now assign a datetime
date = pd.to_datetime('2016-03-31')
s.ix[date]
And...
KeyError: Timestamp('2016-03-31 00:00:00')
If instead we assigned a datetime where we do have an exact match, we should get a result.
date2 = pd.to_datetime('2016-03-31 02:00:00')
Then
s.ix[date2]
Returns:
1
That worked!
To use datetimes and make it look pretty, use a function:
d2s = lambda d: d.strftime('%Y-%d-%m')
Then
s.ix[d2s(date)]
2016-03-31 01:00:00 0
2016-03-31 02:00:00 1
2016-03-31 03:00:00 2
2016-03-31 04:00:00 3
2016-03-31 05:00:00 4
Freq: H, dtype: int64
Upvotes: 3
Reputation: 1507
Let me suggest you a different method rather than using ix. Why don't you directly use a range query?
df = df[df.index.get_level_values('Unit') == 6 &
(df.index.get_level_values('Timestamp') >= tenth &
(df.index.get_level_values('Timestamp') <= tenth)]
Can you just try whether this is working?
Upvotes: 1