Reputation: 5
I am trying to extract the table data from this page.
Tried with bs4 and selenium, but the table data does not appear in the code, tried the wait mode in selenium also did not give.
from selenium import webdriver
url = 'https://www.rad.cvm.gov.br/ENETCONSULTA/frmGerenciaPaginaFRE.aspx?NumeroSequencialDocumento=82594&CodigoTipoInstituicao=2'
driver = webdriver.Safari()
driver.get(url)
iframe = driver.find_element_by_tag_name('iframe')
driver.switch_to.frame(iframe)
driver.page_source
Upvotes: 0
Views: 97
Reputation: 99
There is pandas to help you out. I did this. The output looks better though.
You may need to install lxml
first.
so,first
!pip3 install lxml
then
import pandas as pd
from selenium import webdriver
url = 'https://www.rad.cvm.gov.br/ENETCONSULTA/frmGerenciaPaginaFRE.aspx?NumeroSequencialDocumento=82594&CodigoTipoInstituicao=2'
driver = webdriver.Chrome()
driver.get(url)
iframe = driver.find_element_by_tag_name('iframe')
driver.switch_to.frame(iframe)
dfs = pd.read_html(driver.page_source)
print(dfs[0].head())
#output
0 1 \
0 Conta Descrição
1 3.01 Receitas da Intermediação Financeira
2 3.01.01 Receita de Juros e Rendimentos
3 3.01.02 Receita de Dividendos
4 3.01.03 Resultado de Operações de Câmbio e Variação Ca...
2 3
0 01/01/2019 a 31/03/2019 01/01/2018 a 31/03/2018
1 17.010.000 16.856.000
2 6.142.000 5.973.000
3 NaN NaN
4 303.000 -145.000
Upvotes: 2