Jeff
Jeff

Reputation: 417

How to select rows within a pandas dataframe based on time only when index is date and time

I have a dataframe that looks like this:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2016910 entries, 2009-01-02 04:51:00 to 2012-11-02 20:00:00
Freq: T
Data columns:
X1    2016910  non-null values
X2    2016910  non-null values
X3    2016910  non-null values
X4    2016910  non-null values
X5    2016910  non-null values
dtypes: float64(5)

and I would like to "filter" it by accessing only certain times across the whole range of dates. For example, I'd like to return a dataframe that contains all rows where the time is between 13:00:00 and 14:00:00, but for all of the dates. I am reading the data from a CSV file and the datetime is one column, but I could just as easily make the input CSV file contain a separate date and time. I tried the separate date and time route, and created a multiindex, but when I did, I ended up with two index columns -- one of them containing the proper date with an incorrect time instead of just a date, and the second one containing an incorrect date, and then a correct time, instead of just a time. The input data for my multiindex attempt looked like this:

 20090102,04:51:00,89.9900,89.9900,89.9900,89.9900,100
 20090102,05:36:00,90.0100,90.0100,90.0100,90.0100,200
 20090102,05:44:00,90.1400,90.1400,90.1400,90.1400,100
 20090102,05:50:00,90.0500,90.0500,90.0500,90.0500,500
 20090102,05:56:00,90.1000,90.1000,90.1000,90.1000,300
 20090102,05:57:00,90.1000,90.1000,90.1000,90.1000,200

which I tried to read using this code:

 singledf = pd.DataFrame.from_csv("inputfile",header=None,index_col=[0,1],parse_dates=True)

which resulted in a dataframe that looks like this:

singledf.sort()
singledf

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 716244 entries, (<Timestamp: 2009-01-02 00:00:00>, <Timestamp: 2012-11-04      04:51:00>) to (<Timestamp: 2012-11-02 00:00:00>, <Timestamp: 2012-11-04 20:00:00>)
Data columns:
X2    716244  non-null values
X3    716244  non-null values
X4    716244  non-null values
X5    716244  non-null values
X6    716244  non-null values
dtypes: float64(4), int64(1)

Maybe the multiindex approach is totally wrong, but it's one thing I tried. It seems like it is stuck on using a datetime object, and wants to force the index columns to have a datetime instead of just a date or a time. My source CSV files for the my non-multiindex attempt looks like this:

20090102 04:51:00,89.9900,89.9900,89.9900,89.9900,100
20090102 05:36:00,90.0100,90.0100,90.0100,90.0100,200
20090102 05:44:00,90.1400,90.1400,90.1400,90.1400,100
20090102 05:50:00,90.0500,90.0500,90.0500,90.0500,500
20090102 05:56:00,90.1000,90.1000,90.1000,90.1000,300

I am using pandas .9. Any suggestions are appreciated!

Upvotes: 1

Views: 4510

Answers (1)

Wouter Overmeire
Wouter Overmeire

Reputation: 69096

A regular DatetimeIndex allows to use between_time method.

In [12]: data = """\
 20090102,04:51:00,89.9900,89.9900,89.9900,89.9900,100
 20090102,05:36:00,90.0100,90.0100,90.0100,90.0100,200
 20090102,05:44:00,90.1400,90.1400,90.1400,90.1400,100
 20090102,05:50:00,90.0500,90.0500,90.0500,90.0500,500
 20090102,05:56:00,90.1000,90.1000,90.1000,90.1000,300
 20090102,05:57:00,90.1000,90.1000,90.1000,90.1000,200
"""

In [13]: singledf = pd.DataFrame.from_csv(StringIO(data), header=None, parse_dates=[[0,1]])

In [14]: singledf
Out[14]:
                        X2     X3     X4     X5   X6
X0_X1
2009-01-02 04:51:00  89.99  89.99  89.99  89.99  100
2009-01-02 05:36:00  90.01  90.01  90.01  90.01  200
2009-01-02 05:44:00  90.14  90.14  90.14  90.14  100
2009-01-02 05:50:00  90.05  90.05  90.05  90.05  500
2009-01-02 05:56:00  90.10  90.10  90.10  90.10  300
2009-01-02 05:57:00  90.10  90.10  90.10  90.10  200

In [15]: singledf.between_time('5:30:00', '5:45:00')
Out[15]:
                        X2     X3     X4     X5   X6
X0_X1
2009-01-02 05:36:00  90.01  90.01  90.01  90.01  200
2009-01-02 05:44:00  90.14  90.14  90.14  90.14  100

Upvotes: 2

Related Questions