How to index a day's range of rows in a Dataframe, using a datetime.date?

Question

My multi-indexed data frame is as follows:

df.head()
                         Output  
Unit    Timestamp                                               
1   2016-06-01 00:00:00  225894.9
    2016-06-01 01:00:00  225895.9
    2016-06-01 02:00:00  225896.9   
    2016-06-01 03:00:00  225897.9 
    2016-06-01 04:00:00  225898.9 

df.tail()

                         Output 
Unit Timestamp                                               
16  2016-06-30 18:00:00  150543.1        
    2016-06-30 19:00:00  150544.1        
    2016-06-30 21:00:00  150546.1        
    2016-06-30 22:00:00  150547.1     
    2016-06-30 23:00:00  150548.1

That is, one month's worth of hourly data for 16 units.

I want to pick out one day's data for a given unit. I have an array of the days that appear in the dataframe,

days = array([datetime.date(2016, 6, 1), datetime.date(2016, 6, 2), datetime.date(2016, 6, 3), datetime.date(2016, 6, 4),... etc etc

If I do df.ix[5,'2016-06-10'], I get exactly what I want, that is the day's for unit 5 that date

                       Output
Timestamp
2016-06-10 00:00:00    152364.6
2016-06-10 01:00:00    152365.7
2016-06-10 02:00:00    152366.6
...
2016-06-10 21:00:00    152386.6
2016-06-10 22:00:00    152386.6
2016-06-10 23:00:00    152387.6

But the same indexing doesn't seem to work when I have a datetime.date object instead of a string. For example:

tenth = days[9] evaulates to datetime.date(2016, 6, 10), so far so good.

However when I do df.ix[5,tenth] I get "KeyError: u'no item named 2016-06-10'"

So, the '2016-06-10' "does what I mean" and returns all the data on that day. How can I do the same if with a datetime.date?

piRSquared · Accepted Answer

When you pass a string that looks like a datetime to the pandas selector ix, it uses it like a condition and returns all elements that satisfy. In this case, the string you are using evaluates to a day. Pandas runs ix and returns all rows within that day. When you pass the datetime object, it looks for an exact match. You don't have one since all your datetimes are non-zero hour times.

Consider:

s = pd.Series(range(5), pd.date_range('2016-03-31 01:00:00', periods=5, freq='H'))
s

2016-03-31 01:00:00    0
2016-03-31 02:00:00    1
2016-03-31 03:00:00    2
2016-03-31 04:00:00    3
2016-03-31 05:00:00    4
Freq: H, dtype: int64

get all elements on '2016-03-31'

s.ix['2016-03-31']

2016-03-31 01:00:00    0
2016-03-31 02:00:00    1
2016-03-31 03:00:00    2
2016-03-31 04:00:00    3
2016-03-31 05:00:00    4
Freq: H, dtype: int64

Now assign a datetime

date = pd.to_datetime('2016-03-31')

s.ix[date]

And...

KeyError: Timestamp('2016-03-31 00:00:00')

If instead we assigned a datetime where we do have an exact match, we should get a result.

date2 = pd.to_datetime('2016-03-31 02:00:00')

Then

s.ix[date2]

Returns:

That worked!

To use datetimes and make it look pretty, use a function:

d2s = lambda d: d.strftime('%Y-%d-%m')

Then

s.ix[d2s(date)]

2016-03-31 01:00:00    0
2016-03-31 02:00:00    1
2016-03-31 03:00:00    2
2016-03-31 04:00:00    3
2016-03-31 05:00:00    4
Freq: H, dtype: int64

How to index a day's range of rows in a Dataframe, using a datetime.date?

Answers (2)

Related Questions

How to index a day&#39;s range of rows in a Dataframe, using a datetime.date?

Answers (2)

Related Questions

How to index a day's range of rows in a Dataframe, using a datetime.date?