Select rows from Dataframe with variable number of conditions

Question

I'm trying to write a function that takes as inputs a DataFrame with a column 'timestamp' and a list of tuples. Every tuple will contain a beginning and end time.

What I want to do is to "split" the dataframe in two new ones, where the first contains the rows for which the timestamp value is not contained between the extremes of any tuple, and the other is just the complementary. The number of filter tuples is not known a priori though.

df = DataFrame({'timestamp':[0,1,2,5,6,7,11,22,33,100], 'x':[1,2,3,4,5,6,7,8,9,1])
filt = [(1,4), (10,40)]
left, removed = func(df, filt)

This should give me two dataframes

left: with rows with timestamp [0,5,6,7,100]
removed: with rows with timestamp [1,2,11,22,33]

I believe the right approach is to write a custom function that can be used as a filter, and then call is somehow to filter/mask the dataframe, but I could not find a proper example of how to implement this.

BENY · Accepted Answer

Check

out = df[~pd.concat([df.timestamp.between(*x) for x in filt]).any(level=0)]
Out[175]: 
   timestamp  x
0          0  1
3          5  4
4          6  5
5          7  6
9        100  1

Select rows from Dataframe with variable number of conditions

Answers (2)

Related Questions