Changing the value in a dataframe column depending on the value of two columns in a diffrent dataframe

Question

I have two dataframes made with Pandas in python:

df1

id    business   state   inBusiness
1     painter     AL        no
2     insurance   AL        no
3     lawyer      OH        no
4     dentist     NY        yes
...........

df2  

id    business    state
1     painter       NY
2     painter       AL
3     builder       TX
4     painter       AL    
......

Basically, I want to set the 'inBusiness' value in df1 to 'yes' if an instance of the exact same business/location combo exists in df2.

So for example, if painter/AL exists in df2, than all instances of painter/AL in df1 have their 'inBusiness' value set to yes.

The best I can come up with right now is this:

for index, row in df2.iterrows():
    df1[ (df1.business==str(row['business'])) & (df1.state==str(row['state']))]['inBusiness'] = 'Yes'

but the first dataframe can potentially have hundreds of thousands of rows to loop through for each row in the second dataframe so this method is not very reliable. Is there a nice one-liner I can use here that would also be quick?

michael_j_ward · Accepted Answer

Probably most efficient to create a map

inBusiness = {(business,state): 'yes' 
              for business,state in zip(df2['business'],df2['state'])}
df1['inBusiness'] = [ inBusiness.get((business,state),"no") 
                     for business,state in zip(df1['business'],df1['state'])]
df1

OUTPUTS

    id  business    state   inBusiness
0   1   painter     AL  yes
1   2   insurance   AL  no
2   3   lawyer      OH  no
3   4   dentist     NY  no

Explanation Edit:

You were vague about "explaining further," so I'll give a high level of everything

The built-in zip function that takes two iterables (like two lists, or two series) and "zips" them together into tuples.

a = [1,2,3]
b = ['a','b','c']
for tup in zip(a,b): print(tup)

outputs:

(1, 'a')
(2, 'b')
(3, 'c')

Additionally, tuples in python can be "unpacked" into individual variables

tup = (3,4)
x,y = tup
print(x)
print(y)

You can combine these two things to create dictionary comprehensions

newDict = {k: v for k,v in zip(a,b)}
newDict

Outputs:

{1: 'a', 2: 'b', 3: 'c'}

inBusiness is a python dictionary created using a dictionary comprehension after zipping together the series of df2['business'] and df2['state'].

I did not actually need to unpack the variables, but I did so for what I thought would be clarity.

Note that this map is only half of what you're hopping to do because every key (business,state) in the dictionary maps to yes. Thankfully, dict.get let's us specify a default value to return if the key is not found- which in your case is "no"

Then, the desired column is created using a list-comprehension to achieve your desired result.

Does that cover everything?

Changing the value in a dataframe column depending on the value of two columns in a diffrent dataframe

Answers (2)

Related Questions