Reputation: 6753
I have two dataframes made with Pandas in python:
df1
id business state inBusiness
1 painter AL no
2 insurance AL no
3 lawyer OH no
4 dentist NY yes
...........
df2
id business state
1 painter NY
2 painter AL
3 builder TX
4 painter AL
......
Basically, I want to set the 'inBusiness' value in df1 to 'yes' if an instance of the exact same business/location combo exists in df2.
So for example, if painter/AL exists in df2, than all instances of painter/AL in df1 have their 'inBusiness' value set to yes.
The best I can come up with right now is this:
for index, row in df2.iterrows():
df1[ (df1.business==str(row['business'])) & (df1.state==str(row['state']))]['inBusiness'] = 'Yes'
but the first dataframe can potentially have hundreds of thousands of rows to loop through for each row in the second dataframe so this method is not very reliable. Is there a nice one-liner I can use here that would also be quick?
Upvotes: 0
Views: 69
Reputation: 42875
You could use .merge(how='left', indicator=True)
(indicator
was added in pandas>=0.17
, see docs) to identify matching columns as well as the source of the match to get something along these lines:
df1.merge(df2, how='left', indicator=True) # merges by default on shared columns
id business state inBusiness _merge
0 1 painter AL no both
1 2 insurance AL no left_only
2 3 lawyer OH no left_only
3 4 dentist NY yes left_only
The _merge
indicates in which cases the (business, state)
combinations are available in both df1
and df2
. Then you just need to:
df['inBusiness'] = df._merge == 'both'
to get:
id business state inBusiness _merge
0 1 painter AL True both
1 2 insurance AL False left_only
2 3 lawyer OH False left_only
3 4 dentist NY False left_only
Upvotes: 2
Reputation: 4559
Probably most efficient to create a map
inBusiness = {(business,state): 'yes'
for business,state in zip(df2['business'],df2['state'])}
df1['inBusiness'] = [ inBusiness.get((business,state),"no")
for business,state in zip(df1['business'],df1['state'])]
df1
OUTPUTS
id business state inBusiness
0 1 painter AL yes
1 2 insurance AL no
2 3 lawyer OH no
3 4 dentist NY no
Explanation Edit:
You were vague about "explaining further," so I'll give a high level of everything
The built-in zip
function that takes two iterables (like two lists, or two series) and "zips" them together into tuples.
a = [1,2,3]
b = ['a','b','c']
for tup in zip(a,b): print(tup)
outputs:
(1, 'a')
(2, 'b')
(3, 'c')
Additionally, tuples in python can be "unpacked" into individual variables
tup = (3,4)
x,y = tup
print(x)
print(y)
You can combine these two things to create dictionary comprehensions
newDict = {k: v for k,v in zip(a,b)}
newDict
Outputs:
{1: 'a', 2: 'b', 3: 'c'}
inBusiness
is a python dictionary created using a dictionary comprehension after zipping together the series of df2['business']
and df2['state']
.
I did not actually need to unpack the variables, but I did so for what I thought would be clarity.
Note that this map is only half of what you're hopping to do because every key (business,state)
in the dictionary maps to yes
. Thankfully, dict.get
let's us specify a default value to return if the key is not found- which in your case is "no"
Then, the desired column is created using a list-comprehension to achieve your desired result.
Does that cover everything?
Upvotes: 1