Reputation: 1644
I have a Pandas dataframe from which I need all of the rows where the index matches a certain condition. The dataframe has a MultiIndex, and I need the rows where the first index, the TimeStamp
, is in a specific range. Level 1 of the MultiIndex is a Series of DateTime objects. This following line of code works to check if the month is equal to 5:
compare[compare.index.get_level_values(0).month == 5]
But when I modify the code to check for the rows where the value is in a certain array
compare[compare.index.get_level_values(0).month in [5, 6, 7]]
I get the error
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I also tried using df.loc
to get the values.
compare.loc[compare.index.get_level_values(0).month in [5, 6, 7]]
But this results in the same error.
I also tried using the isin
method instead.
compare[compare.index.get_level_values(0).month.isin([5, 6, 7])]
But this results in the following attribute error:
AttributeError: 'numpy.ndarray' object has no attribute 'isin'
How do I get the rows of the DataFrame where the index meets a specific condition?
Upvotes: 3
Views: 4258
Reputation: 210982
Try this:
compare[compare.index.get_level_values(0).month.isin([5, 6, 7])]
PS this should work for Pandas version 0.18.1+
Demo:
In [45]: import pandas_datareader.data as web
In [46]: df = web.DataReader('AAPL', 'google', '2017-06-01')
In [48]: df = df.assign(i2=np.arange(len(df))).set_index('i2', append=True)
In [49]: df
Out[49]:
Open High Low Close Volume
Date i2
2017-06-01 0 153.17 153.33 152.22 153.18 16404088
2017-06-02 1 153.58 155.45 152.89 155.45 27770715
2017-06-05 2 154.34 154.45 153.46 153.93 25331662
2017-06-06 3 153.90 155.81 153.78 154.45 26624926
2017-06-07 4 155.02 155.98 154.48 155.37 21069647
2017-06-08 5 155.25 155.54 154.40 154.99 21250798
2017-06-09 6 155.19 155.19 146.02 148.98 64882657
2017-06-12 7 145.74 146.09 142.51 145.42 72307330
2017-06-13 8 147.16 147.45 145.15 146.59 34165445
2017-06-14 9 147.50 147.50 143.84 145.16 31531232
... ... ... ... ... ...
2017-07-31 41 149.90 150.33 148.13 148.73 19845920
2017-08-01 42 149.10 150.22 148.41 150.05 35368645
2017-08-02 43 159.28 159.75 156.16 157.14 69936800
2017-08-03 44 157.05 157.21 155.02 155.57 27097296
2017-08-04 45 156.07 157.40 155.69 156.39 20559852
2017-08-07 46 157.06 158.92 156.67 158.81 21870321
2017-08-08 47 158.60 161.83 158.27 160.08 36205896
2017-08-09 48 159.26 161.27 159.11 161.06 26131530
2017-08-10 49 159.90 160.00 154.63 155.32 40804273
2017-08-11 50 156.60 158.57 156.07 157.48 26180743
[51 rows x 5 columns]
In [50]: df[df.index.get_level_values(0).month.isin([5,8])]
Out[50]:
Open High Low Close Volume
Date i2
2017-08-01 42 149.10 150.22 148.41 150.05 35368645
2017-08-02 43 159.28 159.75 156.16 157.14 69936800
2017-08-03 44 157.05 157.21 155.02 155.57 27097296
2017-08-04 45 156.07 157.40 155.69 156.39 20559852
2017-08-07 46 157.06 158.92 156.67 158.81 21870321
2017-08-08 47 158.60 161.83 158.27 160.08 36205896
2017-08-09 48 159.26 161.27 159.11 161.06 26131530
2017-08-10 49 159.90 160.00 154.63 155.32 40804273
2017-08-11 50 156.60 158.57 156.07 157.48 26180743
UPDATE: test with your index values:
In [56]: i = pd.DatetimeIndex(['2016-01-04 01:40:00', '2016-01-04 02:00:00', '2016-01-04 02:10:00', '2016-01-04 02:30:00', '2016-01-04 02:4
...: 0:00'], dtype='datetime64[ns]', name=u'TTimeStamp', freq=None)
In [57]: i
Out[57]: DatetimeIndex(['2016-01-04 01:40:00', '2016-01-04 02:00:00', '2016-01-04 02:10:00', '2016-01-04 02:30:00', '2016-01-04 02:40:00'],
dtype='datetime64[ns]', name='TTimeStamp', freq=None)
In [58]: i.month
Out[58]: Int64Index([1, 1, 1, 1, 1], dtype='int64', name='TTimeStamp')
In [59]: i.month.isin([2,3])
Out[59]: array([False, False, False, False, False], dtype=bool)
In [60]: i.month.isin([1,2,3])
Out[60]: array([ True, True, True, True, True], dtype=bool)
UPDATE2: try the following workaround:
compare[pd.Series(compare.index.get_level_values(0).month).isin([5, 6, 7]).values]
Upvotes: 3