lobjc
lobjc

Reputation: 2961

re.compile not working well

I have this list keywords to use:

keywords = ['a', 'about', 'advance', 'advanced', 'affect', 'after', 'ameliorate', 'among', 'and', 'any', 'apply', 'are', 'as', 'at', 'be', 'been', 'better', 'fix', 'fixed', 'following', 'for', 'form', 'from', 'from a', 'further', 'get', 'got', 'have', 'having', 'help', 'hike', 'hold', 'i', 'impact', 'improve', 'in',  'why', 'will', 'with', 'work with', 'would', 'you', 'your', 'of',]

Am using a simple sentence such as this:

'risk to healthy and fitness'
'risk of healthy and fitness'

My code is this:

keywords = keywords

def Searchy():
    name = 'risk to healthy and fitness'
    name33 = ['exercise','fit','fitness','cardio',]#standard words
    regex1 = re.compile(r'\b(%s+.])\b'%'|'.join(name33))
    regex2 = re.compile(r'\b(%s+.)\b'%'|'.join(keywords))
    h = [m.start()for m in re.finditer (regex1one,name)]
    name55 = [name[h[0]:]][0]
    print name55

I want to filter out most of the clutter, or words and just get the string starting from the first keyword with a result such as:

'to healthy and fitness'

If my first keyword is 'of' i get a correct string such as:

'of healthy and fitness'

If my first keyword is any other word used instead of 'of', i get this instead:

'healthy and fitness'

I want all results to be the same using all keywords. what could I be doing wrong and how do I get it right?

Upvotes: 0

Views: 147

Answers (2)

Superspork
Superspork

Reputation: 403

I think your issue is in regex1. You call name33, which is the looking through that list/string and is giving you everything after it. When I change it to name, it gives correct output.

def Searchy():
    keywords = ['a', 'about', 'advance', 'advanced', 'affect', 'after', 'ameliorate', 'among', 'and', 'any', 'apply', 'are', 'as', 'at', 'be', 'been', 'better', 'fix', 'fixed', 'following', 'for', 'form', 'from', 'from a', 'further', 'get', 'got', 'have', 'having', 'help', 'hike', 'hold', 'i', 'impact', 'improve', 'in',  'why', 'will', 'with', 'work with', 'would', 'you', 'your', 'of',]
    name = 'risk to healthy and fitness'
    name33 = ['exercise','fit','fitness','cardio',]#standard words
    regex1 = re.compile(r'\b(%s+.])\b'%'|'.join(name))
    regex2 = re.compile(r'\b(%s+.)\b'%'|'.join(keywords))
    h = [m.start()for m in re.finditer (regex1,name)]
    name55 = [name[h[0]:]][0]
    print name55

Searchy()

Also, you have regex1one in you h statement. I changed it to regex1

Upvotes: 2

Jimmy Lee Jones
Jimmy Lee Jones

Reputation: 935

Your code works exactly as you wrote it:

If my first keyword is 'of' i get a correct string

Yes, because 'of' is indeed in your keyword list.

If my first keyword is any other word used instead of 'of', i get this instead

Yes, because in the example you gave, the only words before 'healthy and fitness' are 'risk', 'to' and 'of', out of which, only 'of' is in the keyword list you provided. If you wish to get the same result for the second example, you'll need to add 'to' to the keyword list

Upvotes: 0

Related Questions