Reputation: 907
I wrote a script that is checking if values in 'Product content' sheet (column 'TITLE') match with values from 'Keyword list' sheet, column 'KEYWORD' (the same workbook) . Compare_title function returns true or false which is ok but I also need to know which keywords are matching so not only true/false output but also the word that is considered as 'True match'.
The Python script is below.
import pandas as pd
import re
file_path ='C:/Users/User/Desktop/data.xlsx'
def get_keyword(file_path):
"""
Get keywords that are active (based on value in column 'ACTIVE?') from 'KEYWORD' column
from 'Hidden search' terms sheet and convert it into the list
"""
df = pd.read_excel(file_path, sheet_name='Keyword list')
keywords = df['KEYWORD'].to_list()
return keywords
keyword_list = get_keyword(file_path)
def words(phrase: str) -> [str]:
"""
Splits string to words by all characters that are not letters or digits (spaces, commas etc.)
"""
return list(map(lambda x: x.lower(), filter(len, re.split(r'\W', phrase))))
def compare_title(file_path):
"""
Get title from 'Product content' sheet and compare the values with keyword_list values
"""
df = pd.read_excel(file_path, sheet_name='Product content')
df = df.fillna('-')
title = df['TITLE'].apply(lambda find_kw: any([keyword in words(find_kw) for keyword in keyword_list]))
return title
Thanks in advance for your help.
Upvotes: 0
Views: 39
Reputation: 10452
I think this is what you're looking for:
title = df['TITLE'].apply(lambda find_kw: [keyword for keyword in keyword_list if keyword in words(find_kw)]))
This means compare_title
will return list[str]
instead of bool
. If you do if compare_title(...)
it still works as before because an empty list is falsy and a non-empty list is truthy.
Upvotes: 1