Reputation: 71
I am trying to make a function that only extracts the two digit interger out of a specific regex expression.
def extract_number(message_text):
regex_expression = 'What are the top ([0-9]{2}) trends on facebook'
regex= re.compile(regex_expression)
matches = regex.finditer(message_text)
for match in matches:
return match.group()
# if there were no matches, return None
return None
So that when I print
message_text= 'What are the top 54 trends on facebook today'
print(extract_number(message_text))
I will get the number 54. If I write the following beneath, i get the whatever character that I put in (.+)...Why won't it work for numbers?
def extract_number(message_text):
regex_expression = 'What are the top (.+) trends on facebook'
regex= re.compile(regex_expression)
matches = regex.finditer(message_text)
for match in matches:
return match.group()
message_text= 'What are the top fifty trends on facebook today'
print(extract_number(message_text))
Upvotes: 0
Views: 85
Reputation: 439193
The only problem with both your snippets is that you're not returning the capture-group result of interest, but the overall match:
return match.group()
is the same as return match.group(0)
, i.e., it'll return the overall match, which in your case is the entire input string.
By contrast, you want index 1
, i.e., what the 1st capture group - the first subexpression enclosed in (...)
, ([0-9]{2})
- matched:
return match.group(1)
To put it all together:
def extract_number(message_text):
regex_expression = 'What are the top ([0-9]{2}) trends on facebook'
regex= re.compile(regex_expression)
matches = regex.finditer(message_text)
# (See bottom of this answer for a loop-less alternative.)
for match in matches:
return match.group(1) # index 1 returns what the 1st capture group matched
# if there were no matches, return None
return None
message_text= 'What are the top 54 trends on facebook today'
print(extract_number(message_text))
This yields the desired output:
54
Note: As @EvanL00 points out, given that only ever 1 match is needed, the use of regex.finditer()
with a subsequent for
loop that unconditionally returns in the first iteration is unnecessary and may obscure the intent of the code; the simpler and clearer approach is:
match = regex.search(message_text) # Get first match only.
if match:
return match.group(1)
Upvotes: 1
Reputation: 2006
This should work for numeric/string:
def extract_number(message_text):
regex_expression = 'What are the top ([a-zA-Z0-9]+) trends on facebook'
regex= re.compile(regex_expression)
matches = regex.findall(message_text)
if matches:
return matches[0]
message_text= 'What are the top fifty trends on facebook today'
print(extract_number(message_text))
message_text= 'What are the top 50 trends on facebook today'
print(extract_number(message_text))
message_text= 'What are the top -- trends on facebook today'
print(extract_number(message_text))
Output:
fifty
50
None
Upvotes: 0