Reputation: 17
There is two dataframe, first one office location
office = pd.DataFrame({'office_id': [1,2,3],
'location': ['Denver','Sacramento','Phoenix']})
Second one is employee data
employee = pd.DataFrame({'employee_id': [A1,B3,C2,D2],
'name':['Anthony','Benjamin','Charles','Daniel'],
'address': ['123, Sacramento, California','Denver456, Colorado','Phoenix, Arizona','789 Dover']})
I want to get the employee_id and location which available in office location, so the result will be:
employee_id location
A1 Sacramento
B3 Denver
C2 Phoenix
I try to use
locations = [any(i in words for i in office['location'].values) for words in employee['address'].str.split().values]
but I dont know how to change address column to location
Upvotes: 0
Views: 37
Reputation: 37747
Here is a proposition using pandas.Series.explode
with pandas.DataFrame.merge
:
out = (
employee.assign(address= employee["address"].str.findall("([A-Za-z]+)"))
.explode("address")
.merge(office, left_on="address", right_on="location")
.loc[:, ["employee_id", "location"]]
)
print(out)
employee_id location
0 A1 Sacramento
1 B3 Denver
2 C2 Phoenix
Upvotes: 1