Reputation: 35
I have a dataframe which includes two data frame columns with a min_peak and a max_peak value. I am attempting to filter the index('Date') values which are timestamps between the two peaks. I would like to allocate a value of 0 for all dates that are greater than the min_peak but less than the max_peak and a value of 1 if false.
Date
2019-02-02 0.3985
2019-09-24 1.4612
2019-12-18 1.5996
2020-03-12 0.0001
Name: min_peak, dtype: float64
Date
2019-07-03 3.4769
2019-11-14 2.9666
2020-03-05 4.6239
2020-06-09 4.3605
Name: max_peak, dtype: float64
I have a list of the zipped dates for the min_peak and max_peak columns but am not sure how to filter my dataframe using the values.
[(Timestamp('2019-02-02 00:00:00'), Timestamp('2019-07-03 00:00:00')), (Timestamp('2019-09-24 00:00:00'), Timestamp('2019-11-14 00:00:00')), (Timestamp('2019-12-18 00:00:00'), Timestamp('2020-03-05 00:00:00')), (Timestamp('2020-03-12 00:00:00'), Timestamp('2020-06-09 00:00:00'))]
As an example I would filter my dataframe based on the first two peaks '2019-02-02 00:00:00'
and 2019-07-03 00:00:00
, for all index values greater than 2019-02-02 00:00:00
but less than '2019-07-03 00:00:00
to equal 0.
For all values after '2019-07-03 00:00:00'
but less than 2019-09-24 00:00:00'
to equal 1.
I have looked tried using the loc
method and the df.index.isin
but without success.
Upvotes: 0
Views: 211
Reputation: 30609
IIUC you want to set a new colum (flag
in my example) to 1
if the index (Date
) is within any of the tuples from the list. You can use an IntervalIndex
and get_indexer
which will return the index position (>= 0
) in the intervall index or -1
if the date isn't in any intervall of the index.
Example:
import pandas as pd
from pandas import Timestamp
#make sample data
df = pd.DataFrame(index=pd.date_range('2019-01-01', '2020-06-15', freq='W'))
df['flag'] = 0
#make IntervalIndex
l = [(Timestamp('2019-02-02 00:00:00'), Timestamp('2019-07-03 00:00:00')), (Timestamp('2019-09-24 00:00:00'), Timestamp('2019-11-14 00:00:00')), (Timestamp('2019-12-18 00:00:00'), Timestamp('2020-03-05 00:00:00')), (Timestamp('2020-03-12 00:00:00'), Timestamp('2020-06-09 00:00:00'))]
idx = pd.IntervalIndex.from_tuples(l)
#set flag to 1 for all index values within given intervals
df.loc[idx.get_indexer(df.index)>=0, 'flag'] = 1
Upvotes: 1