Michael Arens
Michael Arens

Reputation: 11

Pandas | Filter DF rows with Integers that lie between two Integer values in another Dataframe

I got two Dataframes. The goal is to filter out rows in DF1 that have an Integer value that lies between any of the Integers in the ["Begin"] and ["End"] columns in any of the 37 rows in DF2.

DF1:

INDEX       String         IntValues
1           "string"       808091
2           "string"       1168262
3           "string"       1169294
...          ...           ...  
647         "string"       14193661
648         "string"       14551918

DF2:

Index   Begin        End
1       1196482.2    1216529
2       1791819.7    1834887
3       2008405.1    2014344
...     ...          ...
36      14168540.0   14193933
37      14727507.1   14779605

I think it is possible to use something like :

df1[(df1["IntValues"] >=1196482.2 ) & (df1["IntValues"] <= 1216529),(... 36 more conditions)].

Is there a better way than just writing down these 37 conditions, like a variable for the begin and end values of that "filter window" ?

Edit: As requested a code sample, not from the original DF, but i hope it suffices.

d1 = {
"string":["String0", "String1", "String2", "String3", "String4", "String5", "String6", "String7", "String8", "String9", "String10", "String11", "String12", "String13", "String14"],
"timestamp":[1168262,1169294, 1184451, 1210449,1210543,1210607, 1644328, 
             1665732, 1694388,1817309,1822872,1825310,2093796,2182923,2209252 ],
"should be in": ["Yes", "Yes", "Yes", "No", "No","No", "yes","yes","yes","no","no", "no","yes","yes","no"]
}

df1 = pd.DataFrame(d1)

d2={
'begin' : [1196482.2,1791819.7,2199564.6],
'end' : [1216529,1834887,2212352]
 }

df2 = pd.DataFrame(d2)

Upvotes: 1

Views: 115

Answers (1)

Suhas Mucherla
Suhas Mucherla

Reputation: 1413

Try this:

df_final=[]
for i,j in zip(df2["Begin"],df2["End"]):
    x=df1[(df1["IntValues"] >=i ) & (df1["IntValues"] <= j)]
    df_final.append(x)
    

df_final=pd.concat(df_final,axis=0).reset_index(drop=True)
df_final=df_final.drop_duplicates()

Upvotes: 2

Related Questions