Dark Light
Dark Light

Reputation: 1250

Selecting rows with specified days in datetimeindex dataframe - Pandas

I have a dataframe with datetimeindex. I only need those rows whose index belong to days specified in a list e.g. [1,2] for Monday and Tuesday. Can this be possible in pandas in a single line of code.

Upvotes: 2

Views: 2472

Answers (2)

johnchase
johnchase

Reputation: 13705

You could try this:

In [3]: import pandas as pd
In [4]: import numpy as np

In [5]: index = pd.date_range('11/23/2015', end = '11/30/2015', freq='d')
In [6]: df = pd.DataFrame(np.random.randn(len(index),2),columns=list('AB'),index=index)

In [7]: df
Out[7]:
                   A         B
2015-11-23 -0.673626 -1.009921
2015-11-24 -1.288852 -0.338795
2015-11-25 -1.414042 -0.767050
2015-11-26  0.018223 -0.726230
2015-11-27 -1.288709 -1.144437
2015-11-28  0.121093  1.396825
2015-11-29 -0.791611 -1.014375
2015-11-30  1.223220 -1.223499


In [8]: df[np.in1d(df.index.dayofweek, [1, 2])]
Out[8]:
                   A         B
2015-11-24  0.116678 -0.715655
2015-11-25 -1.494921  0.218176

1 is actually Tuesday here. But that should be fairly easy to account for if needed.

The previous answer was posted while writing this, as a comparison:

In [15]: %timeit df.loc[df.index.to_series().dt.dayofweek.isin([0,1]).values]
100 loops, best of 3: 2.01 ms per loop

In [16]: %timeit df[np.in1d(df.index.dayofweek, [0, 1])]
1000 loops, best of 3: 393 µs per loop

Note this comparison was done on the test DF I created and I don't know how it necessarily extends to larger dataframes, though performance should be consistent.

Upvotes: 3

EdChum
EdChum

Reputation: 393883

IIUC then the following should work:

df[df.index.to_series().dt.dayofweek.isin([0,1])]

Example:

In [9]:
df = pd.DataFrame(index=pd.date_range(start=dt.datetime(2015,1,1), end = dt.datetime(2015,2,1)))
df[df.index.to_series().dt.dayofweek.isin([0,1])]

Out[9]:
Empty DataFrame
Columns: []
Index: [2015-01-05 00:00:00, 2015-01-06 00:00:00, 2015-01-12 00:00:00, 2015-01-13 00:00:00, 2015-01-19 00:00:00, 2015-01-20 00:00:00, 2015-01-26 00:00:00, 2015-01-27 00:00:00]

So this converts the DateTimeIndex to a Series so that we can call isin to test for membership, using .dt.dayofweek and passing 0,1 (this corresponds to Monday and Tuedsay), we use the boolean mask to mask the index

Another way is to construct a boolean mask without converting to a Series:

In [12]:
df[(df.index.dayofweek == 0) | (df.index.dayofweek == 1)]

Out[12]:
Empty DataFrame
Columns: []
Index: [2015-01-05 00:00:00, 2015-01-06 00:00:00, 2015-01-12 00:00:00, 2015-01-13 00:00:00, 2015-01-19 00:00:00, 2015-01-20 00:00:00, 2015-01-26 00:00:00, 2015-01-27 00:00:00]

Or in fact this would work:

In [13]:
df[df.index.dayofweek < 2]

Out[13]:
Empty DataFrame
Columns: []
Index: [2015-01-05 00:00:00, 2015-01-06 00:00:00, 2015-01-12 00:00:00, 2015-01-13 00:00:00, 2015-01-19 00:00:00, 2015-01-20 00:00:00, 2015-01-26 00:00:00, 2015-01-27 00:00:00]

TIMINGS

In [14]:
%timeit df[df.index.dayofweek < 2]
%timeit df[np.in1d(df.index.dayofweek, [1, 2])]

1000 loops, best of 3: 464 µs per loop
1000 loops, best of 3: 521 µs per loop

So my last method is slightly faster here than the np method

Upvotes: 3

Related Questions