user1709173
user1709173

Reputation:

Reading data from a website

I'm trying to read data from a website that contains only text. I'd like to read only the data that follows "&values". I've been able to open the entire website, but I don't know how to get rid of the extraneous data and I don't know any HTML. Any help would be much appreciated.

Upvotes: 1

Views: 287

Answers (2)

unutbu
unutbu

Reputation: 879561

The contents of that url look like url parameters. You could use urllib.parse_qs to parse them into a dict:

import urllib2
import urlparse

url = 'http://www.tip.it/runescape/gec/price_graph.php?avg=1&start=1327715574&mainitem=10350&item=10350'
response = urllib2.urlopen(url)
content = response.read()
params = urlparse.parse_qs(content)
print(params['values'])

Upvotes: 3

RocketDonkey
RocketDonkey

Reputation: 37259

You may want to look into the re module (although if you do eventually move to HTML, regex is not the best solution). Here is a basic example that grabs the text after &values and returns the following number/comma/space combinations:

>>> import re
>>> import urllib2
>>> url = 'http://www.tip.it/runescape/gec/price_graph.php?avg=1&start=1327715574&mainitem=10350&item=10350'
>>> contents = urllib2.urlopen(url).read()
>>> values = re.findall(r'&values=([\d,\s]*)', contents)
>>> values[0].split(',')
['33900000', '33900000', '33900000', #continues....]

Upvotes: 2

Related Questions