Reputation: 62
I am new to using Pandas. I want to select rows from a dataframe where multiple columns match in value. Along the lines of:
if column A equals column AB and column B equals column BC
then I want those values.
I haven't actually used an if statement, I read iteration was not good to use with pandas.
I've tried to find a solution, I'm not sure if it is my syntax or if its unhappy with different data types of the columns?
My code is a little long, so I'll provided just the line where I attempt the selection but I can post the entire code if that is helpful.
dfequal=dfMerged.loc[(dfMerged['MetCode']==dfMerged['GCD_METCODE']) & (dfMerged[dfMerged['Zone Code']==dfMerged['GCD_Senior_ZONE']]) & (dfMerged[dfMerged['Municipality Code']==dfMerged['GCD_CSDUID']])]
Edit*
The expected output would be a dataframe where only rows where the statement is true would exist.
This is the error:
ValueError: operands could not be broadcast together with shapes (84778,) (4462,)
This is my data table i'm pulling from
FileID,MetCode,Municipality Code,Zone
Code,GCD_Senior_ZONE,GCD_METCODE,GCD_CSDUID
A100101,7175,1005018,303006,303006,7175,1005018
A100102,7175,1005018,303006,303006,7175,1005018
A100103,7175,1005018,303006,303006,7175,1005018
A100104,7280,1006009,202003,202003,7280,1006009
A100105,7300,1006017,202003,202003,7300,1006017
A100108,7300,1006017,202003,202003,7300,1006017
A100109,7300,1006017,202003,202003,7300,1006017
A100110,1640,1001485,101001,101001,1640,1001485
A100111,1640,1001517,101001,101001,1640,1001517
A100114,9000,1008011,202003,202003,0,1008011
A100115,9000,1001370,101002,101002,0,1001370
A100119,9000,1003034,202003,202003,0,1003034
Upvotes: 0
Views: 3734
Reputation: 1409
You'll simply need to add the conditions inside parenthesis inside your .loc
and not repeat a DF filter inside the df filter:
First, creating a crude datasample, as you didn't provide one besides the image:
# creating the values, first one will be ID, then next 4 will be the values to compare
check_values = [
[1, 5, 10, 20, 30],
[2, 5, 11, 32, 11],
[3, 10, 10, 20, 20],
[4, 9, 9, 11, 11],
[5, 11, 23, 41, 11]
]
# creating columns names
check_cols = ['id', 'A', 'B', 'C', 'D']
# making the DataFrame
dfcheck = pd.DataFrame(check_values, columns=check_cols)
# Setting the id column, just because
dfcheck.set_index('id', inplace=True)
The solution, where you need to nest each condition inside parenthesis:
dfcheck.loc[(dfcheck['A'] == dfcheck['B']) & (dfcheck['C'] == dfcheck['D'])]
EDIT: What you missed/did wrong?:
Looking at your filter, you're adding unecessary dfMerged inside your parenthesis, your code broken in lines (delete everything inside "** CODE **"):
dfequal=
dfMerged.loc[(dfMerged['MetCode']==dfMerged['GCD_METCODE'])
& (**dfMerged[**dfMerged['Zone Code']==dfMerged['GCD_Senior_ZONE']**]**)
& (**dfMerged[**dfMerged['Municipality Code']==dfMerged['GCD_CSDUID']**]**)]
So you see, that you're searching inside a search that it's not needed? It should be:
dfequal=
dfMerged.loc[(dfMerged['MetCode']==dfMerged['GCD_METCODE'])
& (dfMerged['Zone Code']==dfMerged['GCD_Senior_ZONE'])
& (dfMerged['Municipality Code']==dfMerged['GCD_CSDUID'])]
Upvotes: 3