Reiss
Reiss

Reputation: 35

filter dataframe using zipped list

I have a dataframe which includes two data frame columns with a min_peak and a max_peak value. I am attempting to filter the index('Date') values which are timestamps between the two peaks. I would like to allocate a value of 0 for all dates that are greater than the min_peak but less than the max_peak and a value of 1 if false.

Date
2019-02-02    0.3985
2019-09-24    1.4612
2019-12-18    1.5996
2020-03-12    0.0001
Name: min_peak, dtype: float64
Date
2019-07-03    3.4769
2019-11-14    2.9666
2020-03-05    4.6239
2020-06-09    4.3605
Name: max_peak, dtype: float64

I have a list of the zipped dates for the min_peak and max_peak columns but am not sure how to filter my dataframe using the values.

[(Timestamp('2019-02-02 00:00:00'), Timestamp('2019-07-03 00:00:00')), (Timestamp('2019-09-24 00:00:00'), Timestamp('2019-11-14 00:00:00')), (Timestamp('2019-12-18 00:00:00'), Timestamp('2020-03-05 00:00:00')), (Timestamp('2020-03-12 00:00:00'), Timestamp('2020-06-09 00:00:00'))]

As an example I would filter my dataframe based on the first two peaks '2019-02-02 00:00:00' and 2019-07-03 00:00:00 , for all index values greater than 2019-02-02 00:00:00 but less than '2019-07-03 00:00:00 to equal 0.

For all values after '2019-07-03 00:00:00' but less than 2019-09-24 00:00:00' to equal 1.

I have looked tried using the loc method and the df.index.isin but without success.

Upvotes: 0

Views: 211

Answers (1)

Stef
Stef

Reputation: 30609

IIUC you want to set a new colum (flag in my example) to 1 if the index (Date) is within any of the tuples from the list. You can use an IntervalIndex and get_indexer which will return the index position (>= 0) in the intervall index or -1 if the date isn't in any intervall of the index.

Example:

import pandas as pd
from pandas import Timestamp

#make sample data
df = pd.DataFrame(index=pd.date_range('2019-01-01', '2020-06-15', freq='W'))
df['flag'] = 0

#make IntervalIndex
l = [(Timestamp('2019-02-02 00:00:00'), Timestamp('2019-07-03 00:00:00')), (Timestamp('2019-09-24 00:00:00'), Timestamp('2019-11-14 00:00:00')), (Timestamp('2019-12-18 00:00:00'), Timestamp('2020-03-05 00:00:00')), (Timestamp('2020-03-12 00:00:00'), Timestamp('2020-06-09 00:00:00'))]
idx = pd.IntervalIndex.from_tuples(l)


#set flag to 1 for all index values within given intervals
df.loc[idx.get_indexer(df.index)>=0, 'flag'] = 1

Upvotes: 1

Related Questions