Elizabeth
Elizabeth

Reputation: 71

How can I extract a two digit from a sentence using regex expression?

I am trying to make a function that only extracts the two digit interger out of a specific regex expression.

def extract_number(message_text):
    regex_expression = 'What are the top ([0-9]{2}) trends on facebook'
    regex= re.compile(regex_expression)
    matches = regex.finditer(message_text)
    for match in matches:
        return match.group()

    # if there were no matches, return None
    return None

So that when I print

message_text= 'What are the top 54 trends on facebook today'
print(extract_number(message_text))

I will get the number 54. If I write the following beneath, i get the whatever character that I put in (.+)...Why won't it work for numbers?

def extract_number(message_text):
    regex_expression = 'What are the top (.+) trends on facebook'
    regex= re.compile(regex_expression)
    matches = regex.finditer(message_text)
    for match in matches:
        return match.group()

message_text= 'What are the top fifty trends on facebook today'
print(extract_number(message_text))

Upvotes: 0

Views: 85

Answers (2)

mklement0
mklement0

Reputation: 439193

The only problem with both your snippets is that you're not returning the capture-group result of interest, but the overall match:

return match.group()

is the same as return match.group(0), i.e., it'll return the overall match, which in your case is the entire input string.

By contrast, you want index 1, i.e., what the 1st capture group - the first subexpression enclosed in (...), ([0-9]{2}) - matched:

return match.group(1)

To put it all together:

def extract_number(message_text):
    regex_expression = 'What are the top ([0-9]{2}) trends on facebook'
    regex= re.compile(regex_expression)
    matches = regex.finditer(message_text)
    # (See bottom of this answer for a loop-less alternative.)
    for match in matches:
        return match.group(1)  # index 1 returns what the 1st capture group matched

    # if there were no matches, return None
    return None

message_text= 'What are the top 54 trends on facebook today'
print(extract_number(message_text))

This yields the desired output:

54

Note: As @EvanL00 points out, given that only ever 1 match is needed, the use of regex.finditer() with a subsequent for loop that unconditionally returns in the first iteration is unnecessary and may obscure the intent of the code; the simpler and clearer approach is:

match = regex.search(message_text) # Get first match only.
if match:
    return match.group(1)

Upvotes: 1

Alex Fung
Alex Fung

Reputation: 2006

This should work for numeric/string:

def extract_number(message_text):
    regex_expression = 'What are the top ([a-zA-Z0-9]+) trends on facebook'
    regex= re.compile(regex_expression)
    matches = regex.findall(message_text)
    if matches:
        return matches[0]

message_text= 'What are the top fifty trends on facebook today'
print(extract_number(message_text))
message_text= 'What are the top 50 trends on facebook today'
print(extract_number(message_text))
message_text= 'What are the top -- trends on facebook today'
print(extract_number(message_text))

Output:

fifty
50
None

Upvotes: 0

Related Questions