Reputation: 15
I am trying to find a way to check a list of queries in a data set against specific lists. For example, here is my list:
season = ['winter','spring','summer','fall','autumn']
Sample Query set:
fall crafts for kids
winter crafts for kids
spring crafts for kids
fall craft ideas for kids
summer crafts for kids
autumn crafts for kids
easy winter crafts for kids
spring craft ideas for kids
fun summer crafts for kids
winter craft ideas for kids
Output:
Fall
Winter
Spring
Fall
Summer
Autumn
Winter
Spring
Summer
Winter
I'm able to tag each query with the list name, so for example:
Top Queries Volume Intent
fall crafts for kids 33100 Season
winter crafts for kids 2900 Season
spring crafts for kids 1600 Season
fall craft ideas for kids 1000 Season
summer crafts for kids 1000 Season
autumn crafts for kids 880 Season
easy winter crafts for kids 880 Season
spring craft ideas for kids 880 Season
fun summer crafts for kids 480 Season
winter craft ideas for kids 480 Season
But I would like to map the item in the list. How can this be done?
Upvotes: 0
Views: 71
Reputation: 42143
You could use a regular expression built from your list of keywords:
import re
seasons = ['winter','spring','summer','fall','autumn']
pattern = re.compile(r"\b("+"|".join(map(re.escape,seasons))+r")\b")
qSet = """fall crafts for kids
winter crafts for kids
spring crafts for kids
fall craft ideas for kids
summer crafts for kids
autumn crafts for kids
easy winter crafts for kids
spring craft ideas for kids
fun summer crafts for kids
winter craft ideas for kids""".split("\n")
for q,s in zip(qSet,map(pattern.findall,qSet)): print(q,":",*s)
fall crafts for kids : fall
winter crafts for kids : winter
spring crafts for kids : spring
fall craft ideas for kids : fall
summer crafts for kids : summer
autumn crafts for kids : autumn
easy winter crafts for kids : winter
spring craft ideas for kids : spring
fun summer crafts for kids : summer
winter craft ideas for kids : winter
The regular expression in pattern
selects any of the keywords as a whole word in the sentence. For example, this expression: '\b(winter|spring|summer|fall|autumn)\b'
will not pick up 'fall' in 'watching rainbows under the waterfall in summer'
Upvotes: 0
Reputation: 9619
Or you can just use list comprehension:
seasons = ["winter", "spring", "summer", "fall", "autumn"]
queries = [
"fall crafts for kids",
"winter crafts for kids",
"spring crafts for kids",
"fall craft ideas for kids",
"sumMer crafts for kids",
"autumn crafts for kids",
"easy winter crafts for kids",
"spring craft ideas for kids",
"fun summer crafts for kids",
"winter craft ideas for kids",
]
hits = [i.lower() for l in queries for i in l.split() for x in seasons if i.lower() == x]
result:
['fall','winter','spring','fall','summer','autumn','winter','spring','summer','winter']
Upvotes: 0
Reputation: 169022
Sure thing.
We can define a neat little function that takes a query string and an iterable of strings, and returns all strings from that iterable which are found in the query.
def find_matching_keywords(query, keywords):
return {keyword for keyword in keywords if keyword in query}
Then let's plug in some data...
seasons = ["winter", "spring", "summer", "fall", "autumn"]
queries = [
"fall crafts for kids",
"winter crafts for kids",
"spring crafts for kids",
"fall craft ideas for kids",
"sumMer crafts for kids",
"autumn crafts for kids",
"easy winter crafts for kids",
"spring craft ideas for kids",
"fun summer crafts for kids",
"winter craft ideas for kids",
]
and map over the queries with a dictionary comprehension (note I'm lower-casing the query
, to make the matching case-insensitive):
query_to_keywords = {
query: find_matching_keywords(query.lower(), seasons)
for query in queries
}
and finally we can print things out (You'd probably do something else than just print these, but for the sake of illustration...
for query, keywords in query_to_keywords.items():
print(query, keywords)
The output is
fall crafts for kids {'fall'}
winter crafts for kids {'winter'}
spring crafts for kids {'spring'}
fall craft ideas for kids {'fall'}
sumMer crafts for kids {'summer'}
autumn crafts for kids {'autumn'}
easy winter crafts for kids {'winter'}
spring craft ideas for kids {'spring'}
fun summer crafts for kids {'summer'}
winter craft ideas for kids {'winter'}
If you would need various categories of keywords (e.g. seasons, adjectives, ...), you might extend seasons
to a dict mapping those categories to keyword lists:
category_to_keywords = {
"season": ["winter", "spring", "summer", "fall", "autumn"],
"difficulty": ["easy", "hard"],
}
Then, an additional function to map over that...
def find_matching_keywords_with_categories(query, category_to_keywords):
unfiltered_result = {
category: find_matching_keywords(query, keywords)
for category, keywords
in category_to_keywords.items()
}
return {
category: keywords
for (category, keywords)
in unfiltered_result.items()
if keywords
}
and when called, á la
query_to_keywords = {
query: find_matching_keywords_with_categories(query.lower(), category_to_keywords)
for query in queries
}
we'll end up printing out
fall crafts for kids {'season': {'fall'}}
winter crafts for kids {'season': {'winter'}}
spring crafts for kids {'season': {'spring'}}
fall craft ideas for kids {'season': {'fall'}}
sumMer crafts for kids {'season': {'summer'}}
autumn crafts for kids {'season': {'autumn'}}
easy winter crafts for kids {'season': {'winter'}, 'difficulty': {'easy'}}
spring craft ideas for kids {'season': {'spring'}}
fun summer crafts for kids {'season': {'summer'}}
winter craft ideas for kids {'season': {'winter'}}
Upvotes: 1