Reputation: 21
So I have a dataframe (Df1) of say 100 rows. I also have a dataframe (df2) which contains 1 row. For each data frame, a row represents a property. The columns in question are 'NIY' and 'Purchase Price'. I want to write a function that queries DF1 to show results where the NIY is within 0.1 of DF2 and the purchase price is within 2,000,000 of DF2.
DF1 contains multiple columns, of which contain 'NIY' & 'Purchase Price'.
Purchase_Price | NIY | Purchaser |
---|---|---|
230000000 | 0.15 | GS |
30000000 | 0.08 | JKK |
See code below
df1 = pd.read_csv('CopyTransactions.csv', header=3)
df2 = {'Purchase_Price': [25300000], 'NIY': [0.078]}
df2 = pd.DataFrame(data=df2)
df1.query('Purchase_Price > 1000000 & NIY > 0.09')
df1.query('NIY within range g.0['NIY'] + or - 0.01
& Purchase_price within range g.0['Purchase_Price'] + or - 2,000,000)
The code works until the bottom part, I want to work out how to correctly code the bottom section. (the 0 in the bottom code means index 0/first row)
Thanks!
Upvotes: 1
Views: 241
Reputation: 28630
It's just a matter of filtering/slicing your dataframe.
I would store the values from df2 as a variable. Then could use pandas
.between()
import pandas as pd
df1 = {'Purchase_Price': [23000000, 30000000, 26300000],
'NIY': [0.15, 0.08, 0.069],
'Purchaser': ['GS','JKK','JJK']}
df1 = pd.DataFrame(data=df1)
df2 = {'Purchase_Price': [25300000], 'NIY': [0.078]}
df2 = pd.DataFrame(data=df2)
price_check = df2.iloc[0]['Purchase_Price']
niy_check = df2.iloc[0]['NIY']
price = 2000000
niy = 0.01
df = df1[(df1['Purchase_Price'].between(price_check - price, price_check + price))
& (df1['NIY'].between(niy_check - niy, niy_check + niy))]
Output:
print(df)
Purchase_Price NIY Purchaser
2 26300000 0.069 JJK
In a function:
import pandas as pd
def filter_df(df1, df_check, price_thresh, niy_thresh):
price_check = df_check.iloc[0]['Purchase_Price']
niy_check = df_check.iloc[0]['NIY']
df = df1[(df1['Purchase_Price'].between(price_check - price_thresh, price_check + price_thresh))
& (df1['NIY'].between(niy_check - niy_thresh, niy_check + niy_thresh))]
return df
df = {'Purchase_Price': [23000000, 30000000, 26300000],
'NIY': [0.15, 0.08, 0.069],
'Purchaser': ['GS','JKK','JJK']}
df = pd.DataFrame(data=df)
df_check = {'Purchase_Price': [25300000], 'NIY': [0.078]}
df_check = pd.DataFrame(data=df_check)
new_df = filter_df(df, df_check, price_thresh=2000000, niy_thresh=0.01)
Upvotes: 0
Reputation: 23166
With query
, you can try:
>>> df1.query("@df2.Purchase_Price.iat[0]-2000000 <= Purchase_Price <= @df2.Purchase_Price.iat[0]+2000000 and @df2.NIY.iat[0]-0.1 <= NIY <= @df2.NIY.iat[0]+0.1")
Purchase_Price NIY Purchaser
2 23500000 0.08 FOO
Upvotes: 2