victor
victor

Reputation: 1644

Pandas DataFrame get rows where index matches a certain condition

I have a Pandas dataframe from which I need all of the rows where the index matches a certain condition. The dataframe has a MultiIndex, and I need the rows where the first index, the TimeStamp, is in a specific range. Level 1 of the MultiIndex is a Series of DateTime objects. This following line of code works to check if the month is equal to 5:

compare[compare.index.get_level_values(0).month == 5]

But when I modify the code to check for the rows where the value is in a certain array

compare[compare.index.get_level_values(0).month in [5, 6, 7]]

I get the error

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I also tried using df.loc to get the values.

compare.loc[compare.index.get_level_values(0).month in [5, 6, 7]]

But this results in the same error.

I also tried using the isin method instead.

compare[compare.index.get_level_values(0).month.isin([5, 6, 7])]

But this results in the following attribute error:

AttributeError: 'numpy.ndarray' object has no attribute 'isin'

How do I get the rows of the DataFrame where the index meets a specific condition?

Upvotes: 3

Views: 4258

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210982

Try this:

compare[compare.index.get_level_values(0).month.isin([5, 6, 7])]

PS this should work for Pandas version 0.18.1+

Demo:

In [45]: import pandas_datareader.data as web

In [46]: df = web.DataReader('AAPL', 'google', '2017-06-01')

In [48]: df = df.assign(i2=np.arange(len(df))).set_index('i2', append=True)

In [49]: df
Out[49]:
                 Open    High     Low   Close    Volume
Date       i2
2017-06-01 0   153.17  153.33  152.22  153.18  16404088
2017-06-02 1   153.58  155.45  152.89  155.45  27770715
2017-06-05 2   154.34  154.45  153.46  153.93  25331662
2017-06-06 3   153.90  155.81  153.78  154.45  26624926
2017-06-07 4   155.02  155.98  154.48  155.37  21069647
2017-06-08 5   155.25  155.54  154.40  154.99  21250798
2017-06-09 6   155.19  155.19  146.02  148.98  64882657
2017-06-12 7   145.74  146.09  142.51  145.42  72307330
2017-06-13 8   147.16  147.45  145.15  146.59  34165445
2017-06-14 9   147.50  147.50  143.84  145.16  31531232
...               ...     ...     ...     ...       ...
2017-07-31 41  149.90  150.33  148.13  148.73  19845920
2017-08-01 42  149.10  150.22  148.41  150.05  35368645
2017-08-02 43  159.28  159.75  156.16  157.14  69936800
2017-08-03 44  157.05  157.21  155.02  155.57  27097296
2017-08-04 45  156.07  157.40  155.69  156.39  20559852
2017-08-07 46  157.06  158.92  156.67  158.81  21870321
2017-08-08 47  158.60  161.83  158.27  160.08  36205896
2017-08-09 48  159.26  161.27  159.11  161.06  26131530
2017-08-10 49  159.90  160.00  154.63  155.32  40804273
2017-08-11 50  156.60  158.57  156.07  157.48  26180743

[51 rows x 5 columns]

In [50]: df[df.index.get_level_values(0).month.isin([5,8])]
Out[50]:
                 Open    High     Low   Close    Volume
Date       i2
2017-08-01 42  149.10  150.22  148.41  150.05  35368645
2017-08-02 43  159.28  159.75  156.16  157.14  69936800
2017-08-03 44  157.05  157.21  155.02  155.57  27097296
2017-08-04 45  156.07  157.40  155.69  156.39  20559852
2017-08-07 46  157.06  158.92  156.67  158.81  21870321
2017-08-08 47  158.60  161.83  158.27  160.08  36205896
2017-08-09 48  159.26  161.27  159.11  161.06  26131530
2017-08-10 49  159.90  160.00  154.63  155.32  40804273
2017-08-11 50  156.60  158.57  156.07  157.48  26180743

UPDATE: test with your index values:

In [56]: i = pd.DatetimeIndex(['2016-01-04 01:40:00', '2016-01-04 02:00:00', '2016-01-04 02:10:00', '2016-01-04 02:30:00', '2016-01-04 02:4
    ...: 0:00'], dtype='datetime64[ns]', name=u'TTimeStamp', freq=None)

In [57]: i
Out[57]: DatetimeIndex(['2016-01-04 01:40:00', '2016-01-04 02:00:00', '2016-01-04 02:10:00', '2016-01-04 02:30:00', '2016-01-04 02:40:00'],
dtype='datetime64[ns]', name='TTimeStamp', freq=None)

In [58]: i.month
Out[58]: Int64Index([1, 1, 1, 1, 1], dtype='int64', name='TTimeStamp')

In [59]: i.month.isin([2,3])
Out[59]: array([False, False, False, False, False], dtype=bool)

In [60]: i.month.isin([1,2,3])
Out[60]: array([ True,  True,  True,  True,  True], dtype=bool)

UPDATE2: try the following workaround:

compare[pd.Series(compare.index.get_level_values(0).month).isin([5, 6, 7]).values]

Upvotes: 3

Related Questions