SIM
SIM

Reputation: 22440

Regex used within python giving unknown results

I've written a script in python using regular expression to find phone numbers from two different sites. when I tried with below pattern to scrape the two phone numbers locally then it works flawlessly. However, when i try the same in the websites, It no longer works. It only fetches two unidentified numbers 1999 and 8211.

This is what I've tried so far:

import requests, re

links=[
    'http://www.latamcham.org/contact-us/',
    'http://www.cityscape.com.sg/?page_id=37'
    ]

def FetchPhone(site):
    res = requests.get(site).text
    phone = re.findall(r"\+?[\d]+\s?[\d]+\s?[\d]+",res)[0]  #I'm not sure if it is an ideal pattern. Works locally though
    print(phone)

if __name__ == '__main__':
    for link in links:
        FetchPhone(link)

The output I wish to have:

+65 6881 9083
+65 93895060

This is what I meant by locally:

import re

phonelist = "+65 6881 9083,+65 93895060"

phone = [item for item in re.findall(r"\+?[\d]+\s?[\d]+\s?[\d]+",phonelist)]
print(phone)  #it can print them

Post script: the phone numbers are not generated dynamically. When I print text then I can see the numbers in the console.

Upvotes: 1

Views: 63

Answers (2)

izxle
izxle

Reputation: 405

You are using \d+\s?\d+ which will match 9 9, 99 and 1999 because the + quantifier allows the first \d+ to grab as many digits as it can while leaving at least one digit to the others. One solution is to state a specific number of repetitions you want (like in Andersson's answer).

I suggest you try regex101.com, it will highlight to help you visualize what the regex is matching and capturing. There you can paste an example of the text you want to search and tweak your regex.

Upvotes: 0

Andersson
Andersson

Reputation: 52665

In your case below regex should return required output

r"\+\d{2}\s\d{4}\s?\d{4}"

Note that it can be applied to mentioned schemas:

  • +65 6881 9083
  • +65 93895060

and might not work in other cases

Upvotes: 1

Related Questions