Reputation: 242
I am trying to bring this table into a pandas DataFrame. I've tried using pandas read_html
, I've tried using requests and bs4. I would like to grab the whole table as we see it, but in the html code the table is broken into 3 blocks. Still, I have not figured out how to pin down each one of them.
Here is a starting code:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-sistema-pregao-enUS.asp'
response = requests.get(url, params={'Data': '08/01/2018', 'Mercadoria': 'DI1'})
soup = BeautifulSoup(response.text, 'html.parser')
Upvotes: 1
Views: 513
Reputation: 473843
One important thing that may help you solve it is the fact that the response you get with requests
here does not really contain the rendered table
element, but it definitely contains the desired data.
The problem is, the page requires javascript to have the table rendered. You may notice that your data is inside a script
element:
<script type="text/javascript">
var MercFut0 = "";
var MercFut1 = "";
var MercFut2 = "";
var MercFut3 = "";
MercFut0 = MercFut0 + '<table class="secondary">';
MercFut0 = MercFut0 + '<tr><td></td></tr>';
MercFut1 = MercFut1 + '<table class="secondary" id="teste">';
MercFut1 = MercFut1 + '<tr style="height: 120px;">';
MercFut2 = MercFut2 + '<table class="secondary" id="teste">';
MercFut2 = MercFut2 + '<tr style="height: 120px;">';
MercFut2 = MercFut2 + '<th class="text-center">Open Interest opening*</th>';
MercFut2 = MercFut2 + '<th class="text-center">Open Interest closing**</th>';
...
MercFut1 = MercFut1 + '</tr>';
MercFut0 = MercFut0 + '</table>';
MercFut1 = MercFut1 + '</table>';
MercFut2 = MercFut2 + '</table>';
MercFut3 = MercFut3 + '</table>';
MercadoFut0.innerHTML = MercFut3;
MercadoFut1.innerHTML = MercFut0;
MercadoFut2.innerHTML = MercFut1;
tableShow(2,false);
tableShow(9,false);
</script>
At this point, the easiest approach would probably be to use smth like selenium
to simply has this page rendered.
Or, you may try getting this script and executing it with, for example, pyexecjs
.
Something along these lines:
import execjs
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-sistema-pregao-enUS.asp'
response = requests.get(url, params={'Data': '08/01/2018', 'Mercadoria': 'DI1'})
soup = BeautifulSoup(response.text, 'html.parser')
# compile the desired table html from the data
script = soup.find("script", text=lambda text: text and 'tableShow' in text and "<table" in text).get_text()
script = """
var MercadoFut0 = {},
MercadoFut1 = {},
MercadoFut2 = {};
var tableShow = function () {};
function getTables() {
%s
return [MercFut1, MercFut2, MercFut3];
}
""" % script
ctx = execjs.compile(script)
table1, table2, table3 = ctx.call("getTables")
# parse tables into dataframes
df1 = pd.read_html(table1)[0]
df2 = pd.read_html(table2)[0]
df3 = pd.read_html(table3)[0]
print(df1)
print(df2)
print(df3)
Then, you can "concat" df1
and df2
which should get you the desired table:
df = pd.concat([df2, df1], axis=1)
pd.set_option('display.expand_frame_repr', False)
print(df)
Prints:
0 1 2 3 4 0 1 2 3 4 5 6 7 8 9 10
0 Open Interest opening* Open Interest closing** Number of Trades Trading Volume Financial Volume (R$) Previous Settlement*** Indexed Settlement**** Opening Price Minimum Price Maximum Price Average Price Last Price Settlement Price Change Last Bid Last Offer
1 442761 442761 0 0 0 99999.99 - 0.000 0.000 0.000 0.000 0.000 100000.00 0.01+ 0.000 0.000
2 760332 792464 147 114160 11351487370 99434.83 99434.83 6.404 6.403 6.410 6.406 6.407 99434.72 0.11- 6.404 6.408
3 2343218 2377609 183 99890 9885888562 98967.40 98967.40 6.429 6.421 6.435 6.423 6.425 98967.53 0.13+ 0.000 6.425
...
38 72923 74133 280 3920 126375887 32296.91 32296.91 11.510 11.450 11.600 11.521 11.580 32063.31 233.60- 11.590 11.610
39 5325 5325 0 0 0 28878.71 28878.71 0.000 0.000 0.000 0.000 0.000 28649.55 229.16- 0.000 11.680
Upvotes: 1