Alex B
Alex B

Reputation: 21

How to query python data frame against row of other data frame?

So I have a dataframe (Df1) of say 100 rows. I also have a dataframe (df2) which contains 1 row. For each data frame, a row represents a property. The columns in question are 'NIY' and 'Purchase Price'. I want to write a function that queries DF1 to show results where the NIY is within 0.1 of DF2 and the purchase price is within 2,000,000 of DF2.

DF1 contains multiple columns, of which contain 'NIY' & 'Purchase Price'.

Purchase_Price NIY Purchaser
230000000 0.15 GS
30000000 0.08 JKK

See code below

df1 = pd.read_csv('CopyTransactions.csv', header=3)

df2 = {'Purchase_Price': [25300000], 'NIY': [0.078]}
df2 = pd.DataFrame(data=df2)
df1.query('Purchase_Price > 1000000 & NIY > 0.09')

df1.query('NIY within range g.0['NIY'] + or - 0.01 
     & Purchase_price within range g.0['Purchase_Price'] + or - 2,000,000)

The code works until the bottom part, I want to work out how to correctly code the bottom section. (the 0 in the bottom code means index 0/first row)

Thanks!

Upvotes: 1

Views: 241

Answers (2)

chitown88
chitown88

Reputation: 28630

It's just a matter of filtering/slicing your dataframe.

I would store the values from df2 as a variable. Then could use pandas .between()

import pandas as pd


df1 = {'Purchase_Price': [23000000, 30000000, 26300000], 
       'NIY': [0.15, 0.08, 0.069],
       'Purchaser': ['GS','JKK','JJK']}
df1 = pd.DataFrame(data=df1)

df2 = {'Purchase_Price': [25300000], 'NIY': [0.078]}
df2 = pd.DataFrame(data=df2)


price_check = df2.iloc[0]['Purchase_Price']
niy_check = df2.iloc[0]['NIY']

price = 2000000
niy = 0.01

df = df1[(df1['Purchase_Price'].between(price_check - price, price_check + price)) 
     & (df1['NIY'].between(niy_check - niy, niy_check + niy))]

Output:

print(df)

   Purchase_Price    NIY Purchaser
2        26300000  0.069       JJK

In a function:

import pandas as pd


def filter_df(df1, df_check, price_thresh, niy_thresh):
    price_check = df_check.iloc[0]['Purchase_Price']
    niy_check = df_check.iloc[0]['NIY']
    
    df = df1[(df1['Purchase_Price'].between(price_check - price_thresh, price_check + price_thresh)) 
     & (df1['NIY'].between(niy_check - niy_thresh, niy_check + niy_thresh))]
    
    return df

df = {'Purchase_Price': [23000000, 30000000, 26300000], 
       'NIY': [0.15, 0.08, 0.069],
       'Purchaser': ['GS','JKK','JJK']}
df = pd.DataFrame(data=df)

df_check = {'Purchase_Price': [25300000], 'NIY': [0.078]}
df_check = pd.DataFrame(data=df_check)


new_df = filter_df(df, df_check, price_thresh=2000000, niy_thresh=0.01)

Upvotes: 0

not_speshal
not_speshal

Reputation: 23166

With query, you can try:

>>> df1.query("@df2.Purchase_Price.iat[0]-2000000 <= Purchase_Price <= @df2.Purchase_Price.iat[0]+2000000 and @df2.NIY.iat[0]-0.1 <= NIY <= @df2.NIY.iat[0]+0.1")

   Purchase_Price   NIY Purchaser
2        23500000  0.08       FOO

Upvotes: 2

Related Questions