Reputation: 279
i would like scrap data from site http://www.x-rates.com/table/?from=USD&amount=1 (it is currency exchange site).
I want get "euro" word from table, but i get empty list. Here is my code:
from bs4 import BeautifulSoup
import requests
res = requests.get('http://www.x-rates.com/table/?from=USD&amount=1')
soup = bs4.BeautifulSoup(res.text, 'html.parser')
hehe = soup.select('table.ratesTable:nth-child(4) > tbody:nth-child(2) > tr:nth-child(1) > td:nth-child(1)')
print hehe
I also tried this:
hehe = soup.select('table.ratesTable + table.ratesTable + table.ratesTable + table.ratesTable table.ratesTable + tbody + tbody + tbody + tr + tr + td + td')
but still nothing. What should i change?
Upvotes: 5
Views: 354
Reputation: 180550
If you want to use select you can use use nth-of-type
which is supported in bs4 to pull the first td in the table which is where the first Euro
appears:
soup = BeautifulSoup(res.text, 'html.parser')
hee = soup.select(".ratesTable td:nth-of-type(1)")
print(hee)
Output:
[<td>Euro</td>]
If you wanted to be more specific you could use table.class ..:
print(soup.select("table.ratesTable td:nth-of-type(1)"))
And to get the second Euro:
# 16th row, first td
print(soup.select(".tablesorter.ratesTable tr:nth-of-type(16) td:nth-of-type(1)"))
Output:
[<td>Euro</td>]
Or using find:
soup = BeautifulSoup(res.text, 'html.parser')
table = soup.find("table",{"class":"ratesTable"})
print(table.td.text)
print(table)
Output:
Euro
If you used soup.select("table.ratesTable:nth-child(4)")
you will see it returns nothing so your css is wrong.
To get all the data:
# two tables
tables = soup.select(".ratesTable")
table_data = {}
cols = [th.text for th in tables[0].find_all("th")]
for table in tables:
for tr in table.find_all("tr"):
data = [td.text for td in tr.find_all("td")]
if data:
table_data[data[0]] = dict(zip(cols, data))
from pprint import pprint as pp
pp(table_data)
Output:
{u'Argentine Peso': {u'1.00 USD': u'15.358344',
u'US Dollar': u'Argentine Peso',
u'inv. 1.00 USD': u'0.065111'},
u'Australian Dollar': {u'1.00 USD': u'1.388393',
u'US Dollar': u'Australian Dollar',
u'inv. 1.00 USD': u'0.720257'},
u'Bahraini Dinar': {u'1.00 USD': u'0.376989',
u'US Dollar': u'Bahraini Dinar',
u'inv. 1.00 USD': u'2.652595'},
u'Botswana Pula': {u'1.00 USD': u'11.219075',
u'US Dollar': u'Botswana Pula',
u'inv. 1.00 USD': u'0.089134'},
u'Brazilian Real': {u'1.00 USD': u'3.927908',
u'US Dollar': u'Brazilian Real',
u'inv. 1.00 USD': u'0.254588'},
u'British Pound': {u'1.00 USD': u'0.716854',
u'US Dollar': u'British Pound',
u'inv. 1.00 USD': u'1.394983'},
u'Bruneian Dollar': {u'1.00 USD': u'1.403737',
u'US Dollar': u'Bruneian Dollar',
u'inv. 1.00 USD': u'0.712384'},
u'Bulgarian Lev': {u'1.00 USD': u'1.771194',
u'US Dollar': u'Bulgarian Lev',
u'inv. 1.00 USD': u'0.564591'},
u'Canadian Dollar': {u'1.00 USD': u'1.362152',
u'US Dollar': u'Canadian Dollar',
u'inv. 1.00 USD': u'0.734132'},
u'Chilean Peso': {u'1.00 USD': u'689.453282',
u'US Dollar': u'Chilean Peso',
u'inv. 1.00 USD': u'0.001450'},
u'Chinese Yuan Renminbi': {u'1.00 USD': u'6.532590',
u'US Dollar': u'Chinese Yuan Renminbi',
u'inv. 1.00 USD': u'0.153079'},
u'Colombian Peso': {u'1.00 USD': u'3321.597022',
u'US Dollar': u'Colombian Peso',
u'inv. 1.00 USD': u'0.000301'},
u'Croatian Kuna': {u'1.00 USD': u'6.926768',
u'US Dollar': u'Croatian Kuna',
u'inv. 1.00 USD': u'0.144367'},
u'Czech Koruna': {u'1.00 USD': u'24.605774',
u'US Dollar': u'Czech Koruna',
u'inv. 1.00 USD': u'0.040641'},
u'Danish Krone': {u'1.00 USD': u'6.783374',
u'US Dollar': u'Danish Krone',
u'inv. 1.00 USD': u'0.147419'},
u'Emirati Dirham': {u'1.00 USD': u'3.672956',
u'US Dollar': u'Emirati Dirham',
u'inv. 1.00 USD': u'0.272260'},
u'Euro': {u'1.00 USD': u'0.909064',
u'US Dollar': u'Euro',
u'inv. 1.00 USD': u'1.100033'},
u'Hong Kong Dollar': {u'1.00 USD': u'7.770873',
u'US Dollar': u'Hong Kong Dollar',
u'inv. 1.00 USD': u'0.128686'},
u'Hungarian Forint': {u'1.00 USD': u'282.628733',
u'US Dollar': u'Hungarian Forint',
u'inv. 1.00 USD': u'0.003538'},
u'Icelandic Krona': {u'1.00 USD': u'129.157149',
u'US Dollar': u'Icelandic Krona',
u'inv. 1.00 USD': u'0.007743'},
u'Indian Rupee': {u'1.00 USD': u'68.885961',
u'US Dollar': u'Indian Rupee',
u'inv. 1.00 USD': u'0.014517'},
u'Indonesian Rupiah': {u'1.00 USD': u'13420.180741',
u'US Dollar': u'Indonesian Rupiah',
u'inv. 1.00 USD': u'0.000075'},
u'Iranian Rial': {u'1.00 USD': u'30193.236727',
u'US Dollar': u'Iranian Rial',
u'inv. 1.00 USD': u'0.000033'},
u'Israeli Shekel': {u'1.00 USD': u'3.907342',
u'US Dollar': u'Israeli Shekel',
u'inv. 1.00 USD': u'0.255928'},
u'Japanese Yen': {u'1.00 USD': u'112.854369',
u'US Dollar': u'Japanese Yen',
u'inv. 1.00 USD': u'0.008861'},
u'Kazakhstani Tenge': {u'1.00 USD': u'349.948907',
u'US Dollar': u'Kazakhstani Tenge',
u'inv. 1.00 USD': u'0.002858'},
u'Kuwaiti Dinar': {u'1.00 USD': u'0.300490',
u'US Dollar': u'Kuwaiti Dinar',
u'inv. 1.00 USD': u'3.327899'},
u'Latvian Lat': {u'1.00 USD': u'0.638890',
u'US Dollar': u'Latvian Lat',
u'inv. 1.00 USD': u'1.565215'},
u'Libyan Dinar': {u'1.00 USD': u'1.389216',
u'US Dollar': u'Libyan Dinar',
u'inv. 1.00 USD': u'0.719831'},
u'Lithuanian Litas': {u'1.00 USD': u'3.138815',
u'US Dollar': u'Lithuanian Litas',
u'inv. 1.00 USD': u'0.318592'},
u'Malaysian Ringgit': {u'1.00 USD': u'4.215841',
u'US Dollar': u'Malaysian Ringgit',
u'inv. 1.00 USD': u'0.237201'},
u'Mauritian Rupee': {u'1.00 USD': u'35.959724',
u'US Dollar': u'Mauritian Rupee',
u'inv. 1.00 USD': u'0.027809'},
u'Mexican Peso': {u'1.00 USD': u'18.099833',
u'US Dollar': u'Mexican Peso',
u'inv. 1.00 USD': u'0.055249'},
u'Nepalese Rupee': {u'1.00 USD': u'109.959953',
u'US Dollar': u'Nepalese Rupee',
u'inv. 1.00 USD': u'0.009094'},
u'New Zealand Dollar': {u'1.00 USD': u'1.495957',
u'US Dollar': u'New Zealand Dollar',
u'inv. 1.00 USD': u'0.668468'},
u'Norwegian Krone': {u'1.00 USD': u'8.661961',
u'US Dollar': u'Norwegian Krone',
u'inv. 1.00 USD': u'0.115447'},
u'Omani Rial': {u'1.00 USD': u'0.385000',
u'US Dollar': u'Omani Rial',
u'inv. 1.00 USD': u'2.597403'},
u'Pakistani Rupee': {u'1.00 USD': u'104.604918',
u'US Dollar': u'Pakistani Rupee',
u'inv. 1.00 USD': u'0.009560'},
u'Philippine Peso': {u'1.00 USD': u'47.606650',
u'US Dollar': u'Philippine Peso',
u'inv. 1.00 USD': u'0.021005'},
u'Polish Zloty': {u'1.00 USD': u'3.960685',
u'US Dollar': u'Polish Zloty',
u'inv. 1.00 USD': u'0.252482'},
u'Qatari Riyal': {u'1.00 USD': u'3.641295',
u'US Dollar': u'Qatari Riyal',
u'inv. 1.00 USD': u'0.274628'},
u'Romanian New Leu': {u'1.00 USD': u'4.060863',
u'US Dollar': u'Romanian New Leu',
u'inv. 1.00 USD': u'0.246253'},
u'Russian Ruble': {u'1.00 USD': u'75.913328',
u'US Dollar': u'Russian Ruble',
u'inv. 1.00 USD': u'0.013173'},
u'Saudi Arabian Riyal': {u'1.00 USD': u'3.750501',
u'US Dollar': u'Saudi Arabian Riyal',
u'inv. 1.00 USD': u'0.266631'},
u'Singapore Dollar': {u'1.00 USD': u'1.403737',
u'US Dollar': u'Singapore Dollar',
u'inv. 1.00 USD': u'0.712384'},
u'South African Rand': {u'1.00 USD': u'15.547001',
u'US Dollar': u'South African Rand',
u'inv. 1.00 USD': u'0.064321'},
u'South Korean Won': {u'1.00 USD': u'1238.257908',
u'US Dollar': u'South Korean Won',
u'inv. 1.00 USD': u'0.000808'},
u'Sri Lankan Rupee': {u'1.00 USD': u'144.195067',
u'US Dollar': u'Sri Lankan Rupee',
u'inv. 1.00 USD': u'0.006935'},
u'Swedish Krona': {u'1.00 USD': u'8.530904',
u'US Dollar': u'Swedish Krona',
u'inv. 1.00 USD': u'0.117221'},
u'Swiss Franc': {u'1.00 USD': u'0.994570',
u'US Dollar': u'Swiss Franc',
u'inv. 1.00 USD': u'1.005460'},
u'Taiwan New Dollar': {u'1.00 USD': u'33.188318',
u'US Dollar': u'Taiwan New Dollar',
u'inv. 1.00 USD': u'0.030131'},
u'Thai Baht': {u'1.00 USD': u'35.687352',
u'US Dollar': u'Thai Baht',
u'inv. 1.00 USD': u'0.028021'},
u'Trinidadian Dollar': {u'1.00 USD': u'6.515309',
u'US Dollar': u'Trinidadian Dollar',
u'inv. 1.00 USD': u'0.153485'},
u'Turkish Lira': {u'1.00 USD': u'2.922907',
u'US Dollar': u'Turkish Lira',
u'inv. 1.00 USD': u'0.342125'},
u'Venezuelan Bolivar': {u'1.00 USD': u'6.320083',
u'US Dollar': u'Venezuelan Bolivar',
u'inv. 1.00 USD': u'0.158226'}}
You can structure the dict however you prefer but the logic will still be the same.
If you just wanted the tablesorter
:
# one specific table
table = soup.select(".tablesorter.ratesTable")
table_data = {}
cols = [th.text for th in table[0].find_all("th")]
for tr in table[0].find_all("tr"):
data = [td.text for td in tr.find_all("td")]
if data:
table_data[data[0]] = dict(zip(cols, data))
print(table_data)
Output:
{u'Argentine Peso': {u'1.00 USD': u'15.324285',
u'US Dollar': u'Argentine Peso',
u'inv. 1.00 USD': u'0.065256'},
u'Australian Dollar': {u'1.00 USD': u'1.388630',
u'US Dollar': u'Australian Dollar',
u'inv. 1.00 USD': u'0.720134'},
u'Bahraini Dinar': {u'1.00 USD': u'0.376989',
u'US Dollar': u'Bahraini Dinar',
u'inv. 1.00 USD': u'2.652595'},
u'Botswana Pula': {u'1.00 USD': u'11.219075',
u'US Dollar': u'Botswana Pula',
u'inv. 1.00 USD': u'0.089134'},
u'Brazilian Real': {u'1.00 USD': u'3.936188',
u'US Dollar': u'Brazilian Real',
u'inv. 1.00 USD': u'0.254053'},
u'British Pound': {u'1.00 USD': u'0.717464',
u'US Dollar': u'British Pound',
u'inv. 1.00 USD': u'1.393799'},
u'Bruneian Dollar': {u'1.00 USD': u'1.403808',
u'US Dollar': u'Bruneian Dollar',
u'inv. 1.00 USD': u'0.712348'},
u'Bulgarian Lev': {u'1.00 USD': u'1.775921',
u'US Dollar': u'Bulgarian Lev',
u'inv. 1.00 USD': u'0.563088'},
u'Canadian Dollar': {u'1.00 USD': u'1.362506',
u'US Dollar': u'Canadian Dollar',
u'inv. 1.00 USD': u'0.733942'},
u'Chilean Peso': {u'1.00 USD': u'691.510617',
u'US Dollar': u'Chilean Peso',
u'inv. 1.00 USD': u'0.001446'},
u'Chinese Yuan Renminbi': {u'1.00 USD': u'6.533541',
u'US Dollar': u'Chinese Yuan Renminbi',
u'inv. 1.00 USD': u'0.153056'},
u'Colombian Peso': {u'1.00 USD': u'3313.262601',
u'US Dollar': u'Colombian Peso',
u'inv. 1.00 USD': u'0.000302'},
u'Croatian Kuna': {u'1.00 USD': u'6.920610',
u'US Dollar': u'Croatian Kuna',
u'inv. 1.00 USD': u'0.144496'},
u'Czech Koruna': {u'1.00 USD': u'24.583134',
u'US Dollar': u'Czech Koruna',
u'inv. 1.00 USD': u'0.040678'},
u'Danish Krone': {u'1.00 USD': u'6.776307',
u'US Dollar': u'Danish Krone',
u'inv. 1.00 USD': u'0.147573'},
u'Emirati Dirham': {u'1.00 USD': u'3.673148',
u'US Dollar': u'Emirati Dirham',
u'inv. 1.00 USD': u'0.272246'},
u'Euro': {u'1.00 USD': u'0.908120',
u'US Dollar': u'Euro',
u'inv. 1.00 USD': u'1.101176'},
u'Hong Kong Dollar': {u'1.00 USD': u'7.771176',
u'US Dollar': u'Hong Kong Dollar',
u'inv. 1.00 USD': u'0.128681'},
u'Hungarian Forint': {u'1.00 USD': u'282.305073',
u'US Dollar': u'Hungarian Forint',
u'inv. 1.00 USD': u'0.003542'},
u'Icelandic Krona': {u'1.00 USD': u'129.154766',
u'US Dollar': u'Icelandic Krona',
u'inv. 1.00 USD': u'0.007743'},
u'Indian Rupee': {u'1.00 USD': u'68.865641',
u'US Dollar': u'Indian Rupee',
u'inv. 1.00 USD': u'0.014521'},
u'Indonesian Rupiah': {u'1.00 USD': u'13422.938587',
u'US Dollar': u'Indonesian Rupiah',
u'inv. 1.00 USD': u'0.000074'},
u'Iranian Rial': {u'1.00 USD': u'30193.236717',
u'US Dollar': u'Iranian Rial',
u'inv. 1.00 USD': u'0.000033'},
u'Israeli Shekel': {u'1.00 USD': u'3.903987',
u'US Dollar': u'Israeli Shekel',
u'inv. 1.00 USD': u'0.256148'},
u'Japanese Yen': {u'1.00 USD': u'112.709992',
u'US Dollar': u'Japanese Yen',
u'inv. 1.00 USD': u'0.008872'},
u'Kazakhstani Tenge': {u'1.00 USD': u'349.948907',
u'US Dollar': u'Kazakhstani Tenge',
u'inv. 1.00 USD': u'0.002858'},
u'Kuwaiti Dinar': {u'1.00 USD': u'0.300490',
u'US Dollar': u'Kuwaiti Dinar',
u'inv. 1.00 USD': u'3.327899'},
u'Latvian Lat': {u'1.00 USD': u'0.638227',
u'US Dollar': u'Latvian Lat',
u'inv. 1.00 USD': u'1.566841'},
u'Libyan Dinar': {u'1.00 USD': u'1.389216',
u'US Dollar': u'Libyan Dinar',
u'inv. 1.00 USD': u'0.719831'},
u'Lithuanian Litas': {u'1.00 USD': u'3.135556',
u'US Dollar': u'Lithuanian Litas',
u'inv. 1.00 USD': u'0.318923'},
u'Malaysian Ringgit': {u'1.00 USD': u'4.217441',
u'US Dollar': u'Malaysian Ringgit',
u'inv. 1.00 USD': u'0.237111'},
u'Mauritian Rupee': {u'1.00 USD': u'35.959724',
u'US Dollar': u'Mauritian Rupee',
u'inv. 1.00 USD': u'0.027809'},
u'Mexican Peso': {u'1.00 USD': u'18.131872',
u'US Dollar': u'Mexican Peso',
u'inv. 1.00 USD': u'0.055152'},
u'Nepalese Rupee': {u'1.00 USD': u'109.959303',
u'US Dollar': u'Nepalese Rupee',
u'inv. 1.00 USD': u'0.009094'},
u'New Zealand Dollar': {u'1.00 USD': u'1.494449',
u'US Dollar': u'New Zealand Dollar',
u'inv. 1.00 USD': u'0.669143'},
u'Norwegian Krone': {u'1.00 USD': u'8.655515',
u'US Dollar': u'Norwegian Krone',
u'inv. 1.00 USD': u'0.115533'},
u'Omani Rial': {u'1.00 USD': u'0.385000',
u'US Dollar': u'Omani Rial',
u'inv. 1.00 USD': u'2.597403'},
u'Pakistani Rupee': {u'1.00 USD': u'104.604918',
u'US Dollar': u'Pakistani Rupee',
u'inv. 1.00 USD': u'0.009560'},
u'Philippine Peso': {u'1.00 USD': u'47.623330',
u'US Dollar': u'Philippine Peso',
u'inv. 1.00 USD': u'0.020998'},
u'Polish Zloty': {u'1.00 USD': u'3.957191',
u'US Dollar': u'Polish Zloty',
u'inv. 1.00 USD': u'0.252704'},
u'Qatari Riyal': {u'1.00 USD': u'3.640748',
u'US Dollar': u'Qatari Riyal',
u'inv. 1.00 USD': u'0.274669'},
u'Romanian New Leu': {u'1.00 USD': u'4.056672',
u'US Dollar': u'Romanian New Leu',
u'inv. 1.00 USD': u'0.246507'},
u'Russian Ruble': {u'1.00 USD': u'76.158926',
u'US Dollar': u'Russian Ruble',
u'inv. 1.00 USD': u'0.013130'},
u'Saudi Arabian Riyal': {u'1.00 USD': u'3.749980',
u'US Dollar': u'Saudi Arabian Riyal',
u'inv. 1.00 USD': u'0.266668'},
u'Singapore Dollar': {u'1.00 USD': u'1.403808',
u'US Dollar': u'Singapore Dollar',
u'inv. 1.00 USD': u'0.712348'},
u'South African Rand': {u'1.00 USD': u'15.576569',
u'US Dollar': u'South African Rand',
u'inv. 1.00 USD': u'0.064199'},
u'South Korean Won': {u'1.00 USD': u'1239.577296',
u'US Dollar': u'South Korean Won',
u'inv. 1.00 USD': u'0.000807'},
u'Sri Lankan Rupee': {u'1.00 USD': u'144.195899',
u'US Dollar': u'Sri Lankan Rupee',
u'inv. 1.00 USD': u'0.006935'},
u'Swedish Krona': {u'1.00 USD': u'8.526837',
u'US Dollar': u'Swedish Krona',
u'inv. 1.00 USD': u'0.117277'},
u'Swiss Franc': {u'1.00 USD': u'0.992590',
u'US Dollar': u'Swiss Franc',
u'inv. 1.00 USD': u'1.007465'},
u'Taiwan New Dollar': {u'1.00 USD': u'33.191630',
u'US Dollar': u'Taiwan New Dollar',
u'inv. 1.00 USD': u'0.030128'},
u'Thai Baht': {u'1.00 USD': u'35.677099',
u'US Dollar': u'Thai Baht',
u'inv. 1.00 USD': u'0.028029'},
u'Trinidadian Dollar': {u'1.00 USD': u'6.515314',
u'US Dollar': u'Trinidadian Dollar',
u'inv. 1.00 USD': u'0.153485'},
u'Turkish Lira': {u'1.00 USD': u'2.923851',
u'US Dollar': u'Turkish Lira',
u'inv. 1.00 USD': u'0.342015'},
u'Venezuelan Bolivar': {u'1.00 USD': u'6.349609',
u'US Dollar': u'Venezuelan Bolivar',
u'inv. 1.00 USD': u'0.157490'}}
What you may find helpful is if you open up developer tools and have a look at the styles, you will get a few hints on how to select certain element.
Upvotes: 2
Reputation: 3761
Pandas is much better if you want to scrape tables. For your use case, we have only a couple lines of code:
import pandas as pd
df_all = pd.read_html('http://www.x-rates.com/table/?from=USD&amount=1',header=0,attrs={'class':"tablesorter ratesTable"})
df = pd.concat(df_all).reset_index(drop=True)
df.columns = ['currency','to_usd','inv_usd']
df
currency to_usd inv_usd
0 Argentine Peso 15.358513 0.065110
1 Australian Dollar 1.388332 0.720289
2 Bahraini Dinar 0.376989 2.652594
3 Botswana Pula 11.219075 0.089134
4 Brazilian Real 3.927585 0.254609
...
If you only care about Euros, you can get that row from the dataframe with
df[df.currency=='Euro']
currency to_usd inv_usd
14 Euro 0.908652 1.100532
Also, you can do:
df[df.currency=='Euro'].to_usd.values[0]
0.908652
Alternatively, you can get to the table html with bs using the following code. But ultimately, you are going to want to pull this into something like pandas to process it, so I would suggest going with the method above.
from bs4 import BeautifulSoup
import requests
page = requests.get('http://www.x-rates.com/table/?from=USD&amount=1')
soup = BeautifulSoup(page.content, 'html.parser')
tab_html = soup.find_all('table', {'class':"tablesorter ratesTable"})
tab_html
Upvotes: 1