Reputation:
I'm trying to read data from a website that contains only text. I'd like to read only the data that follows "&values". I've been able to open the entire website, but I don't know how to get rid of the extraneous data and I don't know any HTML. Any help would be much appreciated.
Upvotes: 1
Views: 287
Reputation: 879561
The contents of that url look like url parameters. You could use urllib.parse_qs
to parse them into a dict:
import urllib2
import urlparse
url = 'http://www.tip.it/runescape/gec/price_graph.php?avg=1&start=1327715574&mainitem=10350&item=10350'
response = urllib2.urlopen(url)
content = response.read()
params = urlparse.parse_qs(content)
print(params['values'])
Upvotes: 3
Reputation: 37259
You may want to look into the re
module (although if you do eventually move to HTML, regex is not the best solution). Here is a basic example that grabs the text after &values
and returns the following number/comma/space combinations:
>>> import re
>>> import urllib2
>>> url = 'http://www.tip.it/runescape/gec/price_graph.php?avg=1&start=1327715574&mainitem=10350&item=10350'
>>> contents = urllib2.urlopen(url).read()
>>> values = re.findall(r'&values=([\d,\s]*)', contents)
>>> values[0].split(',')
['33900000', '33900000', '33900000', #continues....]
Upvotes: 2