Edwarric
Edwarric

Reputation: 533

Python: Reading a webpage and extracting text from that page

I'm writing in Python to try and get exchange rates from the website: xe.com/currency/converter (I can't post another link, sorry - I'm at limit) I want to be able to get rates from this file, for example, for the conversion between GBP and USD: Therefore, I would search the url: "http://www.xe.com/currencyconverter/convert/?Amount=1&From=GBP&To=USD" , then get the value printed "1.56371 USD" (the rates at the time I was writing this message), and assign that value as an int to a variable, like rate_usd. At the moment, I was thinking about using the BeautifulSoup module and urllib.request module, and request the url ("http://www.xe.com/currencyconverter/convert/?Amount=1&From=GBP&To=USD") and search through it using BeautifulSoup. At the moment, I'm at this stage in the coding:

import urllib.request
import bs4 from BeautifulSoup

def rates_fetcher(url):
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html)
    # code to search through soup and fetch the converted value
    # e.g. 1.56371
    # How would I extract this value?
    # I have inspected the page element and found the value I want to be in the class:
    # <td width="47%" align="left" class="rightCol">1.56371&nbsp;
    # I'm thinking about searching through the class: class="rightCol"
    # and extracting the value that way, but how?
url1 = "http://www.xe.com/currencyconverter/convert/?Amount=1&From=GBP&To=USD"
rates_fetcher(url1)

Any help would be much appreciated, and thank you whoever took the time to read this.

p.s. Sorry in advance if I have made any typos, I'm kinda' in a hurry :s

Upvotes: 3

Views: 3792

Answers (2)

hbrls
hbrls

Reputation: 2160

Try pyquery. It's a lot better than Soup.

PS: For urllib, try Requests: Http for humans

PS2: Actually I use Node and jQuery/jQuery-like for html scrapping at last.

Upvotes: 0

jgysland
jgysland

Reputation: 345

It sounds like you've got the right idea.

def rates_fetcher(url):
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html)
    return [item.text for item in soup.find_all(class_='rightCol')]

That should do it... This will return a list of the text inside any tag with the class 'rightCol'.

If you haven't read through the Beautiful Soup documentation, you really oughtta. It's straightforward and very useful.

Upvotes: 3

Related Questions