djnz0feh
djnz0feh

Reputation: 437

How to index a day's range of rows in a Dataframe, using a datetime.date?

My multi-indexed data frame is as follows:

df.head()
                         Output  
Unit    Timestamp                                               
1   2016-06-01 00:00:00  225894.9
    2016-06-01 01:00:00  225895.9
    2016-06-01 02:00:00  225896.9   
    2016-06-01 03:00:00  225897.9 
    2016-06-01 04:00:00  225898.9 

df.tail()

                         Output 
Unit Timestamp                                               
16  2016-06-30 18:00:00  150543.1        
    2016-06-30 19:00:00  150544.1        
    2016-06-30 21:00:00  150546.1        
    2016-06-30 22:00:00  150547.1     
    2016-06-30 23:00:00  150548.1  

That is, one month's worth of hourly data for 16 units.

I want to pick out one day's data for a given unit. I have an array of the days that appear in the dataframe,

days = array([datetime.date(2016, 6, 1), datetime.date(2016, 6, 2), datetime.date(2016, 6, 3), datetime.date(2016, 6, 4),... etc etc

If I do df.ix[5,'2016-06-10'], I get exactly what I want, that is the day's for unit 5 that date

                       Output
Timestamp
2016-06-10 00:00:00    152364.6
2016-06-10 01:00:00    152365.7
2016-06-10 02:00:00    152366.6
...
2016-06-10 21:00:00    152386.6
2016-06-10 22:00:00    152386.6
2016-06-10 23:00:00    152387.6

But the same indexing doesn't seem to work when I have a datetime.date object instead of a string. For example:

tenth = days[9] evaulates to datetime.date(2016, 6, 10), so far so good.

However when I do df.ix[5,tenth] I get "KeyError: u'no item named 2016-06-10'"

So, the '2016-06-10' "does what I mean" and returns all the data on that day. How can I do the same if with a datetime.date?

Upvotes: 1

Views: 1481

Answers (2)

piRSquared
piRSquared

Reputation: 294198

When you pass a string that looks like a datetime to the pandas selector ix, it uses it like a condition and returns all elements that satisfy. In this case, the string you are using evaluates to a day. Pandas runs ix and returns all rows within that day. When you pass the datetime object, it looks for an exact match. You don't have one since all your datetimes are non-zero hour times.

Consider:

s = pd.Series(range(5), pd.date_range('2016-03-31 01:00:00', periods=5, freq='H'))
s

2016-03-31 01:00:00    0
2016-03-31 02:00:00    1
2016-03-31 03:00:00    2
2016-03-31 04:00:00    3
2016-03-31 05:00:00    4
Freq: H, dtype: int64

get all elements on '2016-03-31'

s.ix['2016-03-31']

2016-03-31 01:00:00    0
2016-03-31 02:00:00    1
2016-03-31 03:00:00    2
2016-03-31 04:00:00    3
2016-03-31 05:00:00    4
Freq: H, dtype: int64

Now assign a datetime

date = pd.to_datetime('2016-03-31')

s.ix[date]

And...

KeyError: Timestamp('2016-03-31 00:00:00')

If instead we assigned a datetime where we do have an exact match, we should get a result.

date2 = pd.to_datetime('2016-03-31 02:00:00')

Then

s.ix[date2]

Returns:

1

That worked!

To use datetimes and make it look pretty, use a function:

d2s = lambda d: d.strftime('%Y-%d-%m')

Then

s.ix[d2s(date)]

2016-03-31 01:00:00    0
2016-03-31 02:00:00    1
2016-03-31 03:00:00    2
2016-03-31 04:00:00    3
2016-03-31 05:00:00    4
Freq: H, dtype: int64

Upvotes: 3

Christin Jose
Christin Jose

Reputation: 1507

Let me suggest you a different method rather than using ix. Why don't you directly use a range query?

df = df[df.index.get_level_values('Unit') == 6 & 
        (df.index.get_level_values('Timestamp') >= tenth & 
        (df.index.get_level_values('Timestamp') <= tenth)]

Can you just try whether this is working?

Upvotes: 1

Related Questions