Arthur
Arthur

Reputation: 17

Match Mess Up Address Employee Contains Office Location

There is two dataframe, first one office location

office = pd.DataFrame({'office_id': [1,2,3],
                    'location': ['Denver','Sacramento','Phoenix']})

Second one is employee data

employee = pd.DataFrame({'employee_id': [A1,B3,C2,D2],
                       'name':['Anthony','Benjamin','Charles','Daniel'],
                    'address': ['123, Sacramento, California','Denver456, Colorado','Phoenix, Arizona','789 Dover']})  

I want to get the employee_id and location which available in office location, so the result will be:

employee_id  location
A1           Sacramento
B3           Denver
C2           Phoenix

I try to use

locations = [any(i in words for i in office['location'].values) for words in employee['address'].str.split().values]

but I dont know how to change address column to location

Upvotes: 0

Views: 37

Answers (1)

Timeless
Timeless

Reputation: 37747

Here is a proposition using pandas.Series.explode with pandas.DataFrame.merge :

out = (
        employee.assign(address= employee["address"].str.findall("([A-Za-z]+)"))
                .explode("address")
                .merge(office, left_on="address", right_on="location")
                .loc[:, ["employee_id", "location"]]
      )

# Output :

print(out)
  employee_id    location
0          A1  Sacramento
1          B3      Denver
2          C2     Phoenix

Upvotes: 1

Related Questions