Reputation: 22440
I've written a script in python using regular expression
to find phone numbers from two different sites. when I tried with below pattern to scrape the two phone numbers locally then it works flawlessly. However, when i try the same in the websites, It no longer works. It only fetches two unidentified numbers 1999
and 8211
.
This is what I've tried so far:
import requests, re
links=[
'http://www.latamcham.org/contact-us/',
'http://www.cityscape.com.sg/?page_id=37'
]
def FetchPhone(site):
res = requests.get(site).text
phone = re.findall(r"\+?[\d]+\s?[\d]+\s?[\d]+",res)[0] #I'm not sure if it is an ideal pattern. Works locally though
print(phone)
if __name__ == '__main__':
for link in links:
FetchPhone(link)
The output I wish to have:
+65 6881 9083
+65 93895060
This is what I meant by locally:
import re
phonelist = "+65 6881 9083,+65 93895060"
phone = [item for item in re.findall(r"\+?[\d]+\s?[\d]+\s?[\d]+",phonelist)]
print(phone) #it can print them
Post script: the phone numbers are not generated dynamically. When I print text then I can see the numbers
in the console.
Upvotes: 1
Views: 63
Reputation: 405
You are using \d+\s?\d+
which will match 9 9
, 99
and 1999
because the +
quantifier allows the first \d+
to grab as many digits as it can while leaving at least one digit to the others. One solution is to state a specific number of repetitions you want (like in Andersson's answer).
I suggest you try regex101.com, it will highlight to help you visualize what the regex is matching and capturing. There you can paste an example of the text you want to search and tweak your regex.
Upvotes: 0
Reputation: 52665
In your case below regex should return required output
r"\+\d{2}\s\d{4}\s?\d{4}"
Note that it can be applied to mentioned schemas:
and might not work in other cases
Upvotes: 1