Yaxit
Yaxit

Reputation: 187

Select rows from Dataframe with variable number of conditions

I'm trying to write a function that takes as inputs a DataFrame with a column 'timestamp' and a list of tuples. Every tuple will contain a beginning and end time.

What I want to do is to "split" the dataframe in two new ones, where the first contains the rows for which the timestamp value is not contained between the extremes of any tuple, and the other is just the complementary. The number of filter tuples is not known a priori though.

df = DataFrame({'timestamp':[0,1,2,5,6,7,11,22,33,100], 'x':[1,2,3,4,5,6,7,8,9,1])
filt = [(1,4), (10,40)]
left, removed = func(df, filt)

This should give me two dataframes

I believe the right approach is to write a custom function that can be used as a filter, and then call is somehow to filter/mask the dataframe, but I could not find a proper example of how to implement this.

Upvotes: 0

Views: 95

Answers (2)

BENY
BENY

Reputation: 323226

Check

out = df[~pd.concat([df.timestamp.between(*x) for x in filt]).any(level=0)]
Out[175]: 
   timestamp  x
0          0  1
3          5  4
4          6  5
5          7  6
9        100  1

Upvotes: 3

wasif
wasif

Reputation: 15470

Can't you use filtering with .isin():

left,removed = df[df['timestamp'].isin([0,5,6,7,100])],df[df['timestamp'].isin([1,2,11,22,33])]

Upvotes: 0

Related Questions