Reputation: 25
I am trying to write a python script which uses the "urllib" and "re" libraries to extract weather forecast information off a html page, but I cannot seem to get any values returned, could anybody help me?
import urllib
import re
url = ('http://www.metoffice.gov.uk/public/weather/forecast/gcptz5sys')
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
regex =('<span title="Maximum daytime temperature" data-c="10" data-f="50">(.+?)<sup>°C</sup></span>')
pattern = re.compile(regex)
temp = re.findall(pattern,htmltext)
print (temp)
I am using Python 2.7 by the way...
Upvotes: 1
Views: 120
Reputation: 2975
Try this:
#!/usr/bin/env python
import urllib
import re
def main():
url = ('http://www.metoffice.gov.uk/public/weather/forecast/gcptz5sys')
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
htmltext = str(htmltext).replace('\n', '')
htmltext = str(htmltext).replace('\t', '')
htmltext = str(htmltext).replace(' ', '')
pattern = re.compile('<spantitle="Maximumdaytimetemperature"data-c="7"data-f="45">(?P<temperature>.+?)<sup>°C</sup></span>')
for match in pattern.finditer(htmltext):
print match.group('temperature')
if __name__ == "__main__":
main()
So what I did here:
P.S.: I removed all white space charachter because it can be changed dynamically in backend and your regex should be changed every time. By remove all white space and new line characters you can avoid this problem.
Upvotes: 1