Shan
Shan

Reputation: 381

How do I get numerical data while web scraping?

I'm completely new to web scraping, so any reference sites would be great. I am slightly confused as to how I'm getting the actual data. When I print(theText), I get a bunch of html code (which should be correct). How do I exactly go about getting values from this? Do I have to use regular expressions to get the actual numerical data?

def getData():
    request = urllib.request.Request("http://www.weather.com/weather/5day/l/USGA0028:1:US")
    response = urllib.request.urlopen(request)
    the_page = response.read()
    theText = the_page.decode()
    print(theText)

Upvotes: 0

Views: 345

Answers (2)

plasmid0h
plasmid0h

Reputation: 206

no, you shouldn't use RegExp for HTML. Instead. Have a look at BeatifulSoup4

Upvotes: 0

Lawrence Benson
Lawrence Benson

Reputation: 1406

Have a look at BeautifulSoup. It allows you to get elements by their IDs or tags. It is very useful for basic scraping.
You can just call beutiful soup with the response text (html page) and then you can call the bs methods

Upvotes: 5

Related Questions