Reputation: 303
I am trying to use my df as a lookup table, and trying to determine if my string contains a value in that df. Simple example
str = 'John Smith Business Analyst'
df = pd.read_pickle('job_titles.pickle')
The df would be one column with several job titles.
df = accountant, lawyer, CFO, business analyst, etc..
Now, somehow be able to determine that str has a substring: Business Analyst, because that value is contained in my df.
The return result would be the substring = 'Business Analyst'
If the original str was:
str = 'John Smith Business'
Then the return would be empty since no substring matches a string in the df.
I have it working if it is for one word. For example:
df = pd.read_pickle('cities.pickle')
df = Calgary, Edmonton, Toronto, etc
str = 'John Smith Business Analyst Calgary AB Canada'
str_list = str.split()
for word in str_list:
df_location = df[df['name'].str.match(word)]
if not df_location.empty:
break
df_location = Calgary
The city will be found in the df, and return that one row. Just not sure how when it is more than one word.
Upvotes: 0
Views: 104
Reputation: 206
I am not sure what you want to do with the returned value exactly, but here is a way to identify it at least. First, I made a toy dataframe:
import pandas as pd
titles_df = pd.DataFrame({'title' : ['Business Analyst', 'Data Scientist', 'Plumber', 'Baker', 'Accountant', 'CEO']})
search_name = 'John Smith Business Analyst'
titles_df
title
0 Business Analyst
1 Data Scientist
2 Plumber
3 Baker
4 Accountant
5 CEO
Then, I loop through the values in the title
column to see if any of them are in the search term:
for val in titles_df['title'].values:
if val in search_name:
print(val)
If you want to do this over all the names in a dataframe column and assign a new column with the title you can do the following:
First, I create a dataframe with some names:
names_df = pd.DataFrame({'name' : ['John Smith Business Analyst', 'Dorothy Roberts CEO', 'Jim Miller Dancer', 'Samuel Adams Accountant']})
Then, I loop through the values of names and values of titles and assign the matched titles to a title column in the names dataframe (unmatched ones will have an empty string):
names_df['title'] = ''
for name in names_df['name'].values:
for title in titles_df['title'].values:
if title in name:
names_df['title'][names_df['name'] == name] = title
names_df
name title
0 John Smith Business Analyst Business Analyst
1 Dorothy Roberts CEO CEO
2 Jim Miller Dancer
3 Samuel Adams Accountant Accountant
Upvotes: 1