Reputation: 25
I have a problem using Requests and lxml libraries to do webscraping in Python.
I need to capture the information in yellow from the website (http://www.b3.com.br/pt_br/market-data-e-indices/indices/indices-amplos/indice-ibovespa-ibovespa-composicao-da-carteira.htm). However, this returns: []
Please, could someone help me?
send the code below
from lxml import html
import requests
page = requests.get('http://www.b3.com.br/pt_br/market-data-e-indices/indices/indices-amplos/indice-ibovespa-ibovespa-composicao-da-carteira.htm')
tree = html.fromstring(page.content)
cod = tree.xpath('//*[@id="divContainerIframeB3"]/div/div[1]/form/div[2]/div/table/tbody/tr[1]/td[1]')
print('The code is : ', cod)
Upvotes: 1
Views: 120
Reputation: 195478
The data is loaded via Javascript from external source. You can use this script to load the Json data:
import json
import base64
import requests
api_url = "https://sistemaswebb3-listados.b3.com.br/indexProxy/indexCall/GetPortfolioDay/{encoded_string}"
page = 1
index = "IBOV"
s = {
"language": "pt-br",
"pageNumber": page,
"pageSize": 20,
"index": index,
"segment": "1",
}
encoded_string = base64.b64encode(str(s).encode("utf-8")).decode("utf-8")
data = requests.get(
api_url.format(encoded_string=encoded_string),
verify=False,
).json()
# uncomment this to get all data:
# print(json.dumps(data, indent=4))
for result in data["results"]:
print(
"{:<8} {:<15} {:15}".format(
result["cod"], result["asset"], result["theoricalQty"]
)
)
Prints:
ABEV3 AMBEV S/A 4.355.174.839
ASAI3 ASSAI 157.635.935
AZUL4 AZUL 327.283.207
BTOW3 B2W DIGITAL 201.549.295
B3SA3 B3 1.930.877.944
BBSE3 BBSEGURIDADE 671.584.841
BRML3 BR MALLS PAR 843.728.684
BBDC3 BRADESCO 1.261.986.269
BBDC4 BRADESCO 4.687.814.597
BRAP4 BRADESPAR 222.075.664
BBAS3 BRASIL 1.283.197.221
BRKM5 BRASKEM 264.640.575
BRFS3 BRF SA 811.759.800
BPAC11 BTGP BANCO 263.871.572
CRFB3 CARREFOUR BR 391.758.726
CCRO3 CCR SA 1.115.695.556
CMIG4 CEMIG 969.723.092
HGTX3 CIA HERING 126.186.408
CIEL3 CIELO 1.112.196.638
COGN3 COGNA ON 1.847.994.874
Upvotes: 1