Reputation: 57
I succeeded to parse url into LIST format, but somehow when I use pd.DataFrame() all the data resets. Can you please help me where I get wrong?
These are what I've scraped:
#currency
URL = "https://www.xe.com/currencytables/?from=USD&date=2019-05-01"
data = requests.get(URL).text
#parse url
soup = bs(data, "html.parser")
#find the tables you want
table = soup.findAll("table")[0:1]
#read it into pandas
FXrate = pd.read_html(str(table))
FXrate
and this works.
Problem occurs when:
FXrate = pd.DataFrame(FXrate)
FXrate
From what I've known, I just converted format from list to DataFrame, but somehow the whole table doesn't come up well.
Upvotes: 2
Views: 57
Reputation: 46469
Just one sidenote. The read_html
table works with <table><tr><td>
tags. So it can comprehend the tables that way.
This one will work.
<table>
<tbody>
<tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
</tbody>
</table>
It will not work on div
tables though. This one will not work.
<div class="divTable">
<div class="divTableBody">
<div class="divTableRow">
<div class="divTableCell"> </div>
<div class="divTableCell"> </div>
</div>
<div class="divTableRow">
<div class="divTableCell"> </div>
<div class="divTableCell"> </div>
</div>
</div>
</div>
Upvotes: 0
Reputation: 863501
You can pass url link to read_html
and select first value of list of DataFrames by indexing - [0]
:
URL = "https://www.xe.com/currencytables/?from=USD&date=2019-05-01"
FXrate = pd.read_html(URL)[0]
print (FXrate.head())
Currency code ▲▼ Currency name ▲▼ Units per USD USD per Unit
0 USD US Dollar 1.000000 1.000000
1 EUR Euro 0.889216 1.124586
2 GBP British Pound 0.764041 1.308830
3 INR Indian Rupee 69.564191 0.014375
4 AUD Australian Dollar 1.420778 0.703840
If need second table
:
FXrate = pd.read_html(URL)[1]
print (FXrate.head())
Currency Rate Unnamed: 2
0 EUR / USD 1.11483 ▼
1 GBP / EUR 1.13897 ▼
2 USD / JPY 110.13300 ▼
3 GBP / USD 1.26976 ▼
4 USD / CHF 1.01103 ▼
Upvotes: 1