Reputation: 1250
I have a dataframe with datetimeindex. I only need those rows whose index belong to days specified in a list e.g. [1,2] for Monday and Tuesday. Can this be possible in pandas in a single line of code.
Upvotes: 2
Views: 2472
Reputation: 13705
You could try this:
In [3]: import pandas as pd
In [4]: import numpy as np
In [5]: index = pd.date_range('11/23/2015', end = '11/30/2015', freq='d')
In [6]: df = pd.DataFrame(np.random.randn(len(index),2),columns=list('AB'),index=index)
In [7]: df
Out[7]:
A B
2015-11-23 -0.673626 -1.009921
2015-11-24 -1.288852 -0.338795
2015-11-25 -1.414042 -0.767050
2015-11-26 0.018223 -0.726230
2015-11-27 -1.288709 -1.144437
2015-11-28 0.121093 1.396825
2015-11-29 -0.791611 -1.014375
2015-11-30 1.223220 -1.223499
In [8]: df[np.in1d(df.index.dayofweek, [1, 2])]
Out[8]:
A B
2015-11-24 0.116678 -0.715655
2015-11-25 -1.494921 0.218176
1 is actually Tuesday here. But that should be fairly easy to account for if needed.
The previous answer was posted while writing this, as a comparison:
In [15]: %timeit df.loc[df.index.to_series().dt.dayofweek.isin([0,1]).values]
100 loops, best of 3: 2.01 ms per loop
In [16]: %timeit df[np.in1d(df.index.dayofweek, [0, 1])]
1000 loops, best of 3: 393 µs per loop
Note this comparison was done on the test DF I created and I don't know how it necessarily extends to larger dataframes, though performance should be consistent.
Upvotes: 3
Reputation: 393883
IIUC then the following should work:
df[df.index.to_series().dt.dayofweek.isin([0,1])]
Example:
In [9]:
df = pd.DataFrame(index=pd.date_range(start=dt.datetime(2015,1,1), end = dt.datetime(2015,2,1)))
df[df.index.to_series().dt.dayofweek.isin([0,1])]
Out[9]:
Empty DataFrame
Columns: []
Index: [2015-01-05 00:00:00, 2015-01-06 00:00:00, 2015-01-12 00:00:00, 2015-01-13 00:00:00, 2015-01-19 00:00:00, 2015-01-20 00:00:00, 2015-01-26 00:00:00, 2015-01-27 00:00:00]
So this converts the DateTimeIndex
to a Series
so that we can call isin
to test for membership, using .dt.dayofweek
and passing 0,1
(this corresponds to Monday and Tuedsay), we use the boolean mask to mask the index
Another way is to construct a boolean mask without converting to a Series
:
In [12]:
df[(df.index.dayofweek == 0) | (df.index.dayofweek == 1)]
Out[12]:
Empty DataFrame
Columns: []
Index: [2015-01-05 00:00:00, 2015-01-06 00:00:00, 2015-01-12 00:00:00, 2015-01-13 00:00:00, 2015-01-19 00:00:00, 2015-01-20 00:00:00, 2015-01-26 00:00:00, 2015-01-27 00:00:00]
Or in fact this would work:
In [13]:
df[df.index.dayofweek < 2]
Out[13]:
Empty DataFrame
Columns: []
Index: [2015-01-05 00:00:00, 2015-01-06 00:00:00, 2015-01-12 00:00:00, 2015-01-13 00:00:00, 2015-01-19 00:00:00, 2015-01-20 00:00:00, 2015-01-26 00:00:00, 2015-01-27 00:00:00]
TIMINGS
In [14]:
%timeit df[df.index.dayofweek < 2]
%timeit df[np.in1d(df.index.dayofweek, [1, 2])]
1000 loops, best of 3: 464 µs per loop
1000 loops, best of 3: 521 µs per loop
So my last method is slightly faster here than the np method
Upvotes: 3