Reputation: 2961
I have this list keywords to use:
keywords = ['a', 'about', 'advance', 'advanced', 'affect', 'after', 'ameliorate', 'among', 'and', 'any', 'apply', 'are', 'as', 'at', 'be', 'been', 'better', 'fix', 'fixed', 'following', 'for', 'form', 'from', 'from a', 'further', 'get', 'got', 'have', 'having', 'help', 'hike', 'hold', 'i', 'impact', 'improve', 'in', 'why', 'will', 'with', 'work with', 'would', 'you', 'your', 'of',]
Am using a simple sentence such as this:
'risk to healthy and fitness'
'risk of healthy and fitness'
My code is this:
keywords = keywords
def Searchy():
name = 'risk to healthy and fitness'
name33 = ['exercise','fit','fitness','cardio',]#standard words
regex1 = re.compile(r'\b(%s+.])\b'%'|'.join(name33))
regex2 = re.compile(r'\b(%s+.)\b'%'|'.join(keywords))
h = [m.start()for m in re.finditer (regex1one,name)]
name55 = [name[h[0]:]][0]
print name55
I want to filter out most of the clutter, or words and just get the string starting from the first keyword with a result such as:
'to healthy and fitness'
If my first keyword is 'of' i get a correct string such as:
'of healthy and fitness'
If my first keyword is any other word used instead of 'of', i get this instead:
'healthy and fitness'
I want all results to be the same using all keywords. what could I be doing wrong and how do I get it right?
Upvotes: 0
Views: 147
Reputation: 403
I think your issue is in regex1. You call name33, which is the looking through that list/string and is giving you everything after it. When I change it to name, it gives correct output.
def Searchy():
keywords = ['a', 'about', 'advance', 'advanced', 'affect', 'after', 'ameliorate', 'among', 'and', 'any', 'apply', 'are', 'as', 'at', 'be', 'been', 'better', 'fix', 'fixed', 'following', 'for', 'form', 'from', 'from a', 'further', 'get', 'got', 'have', 'having', 'help', 'hike', 'hold', 'i', 'impact', 'improve', 'in', 'why', 'will', 'with', 'work with', 'would', 'you', 'your', 'of',]
name = 'risk to healthy and fitness'
name33 = ['exercise','fit','fitness','cardio',]#standard words
regex1 = re.compile(r'\b(%s+.])\b'%'|'.join(name))
regex2 = re.compile(r'\b(%s+.)\b'%'|'.join(keywords))
h = [m.start()for m in re.finditer (regex1,name)]
name55 = [name[h[0]:]][0]
print name55
Searchy()
Also, you have regex1one in you h statement. I changed it to regex1
Upvotes: 2
Reputation: 935
Your code works exactly as you wrote it:
If my first keyword is 'of' i get a correct string
Yes, because 'of' is indeed in your keyword list.
If my first keyword is any other word used instead of 'of', i get this instead
Yes, because in the example you gave, the only words before 'healthy and fitness' are 'risk', 'to' and 'of', out of which, only 'of' is in the keyword list you provided. If you wish to get the same result for the second example, you'll need to add 'to' to the keyword list
Upvotes: 0