John
John

Reputation: 39

Web scraping in python HTML page does not come full

I'm trying to scrape the two tables from a page

But when I use soup.find('table') it just doesn't find it. Also, when I print the soup object, the table part of the HTML code is not printed, Any solutions?

My code so far:

from bs4 import BeautifulSoup
import pandas as pd
import requests

url = 'http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/mercado-a-vista/opcoes/posicoes-em-aberto/posicoes-em-aberto-8AE490CA64BA055F0164CCCAE1F1460A.htm?empresaEmissora=AMBEV%20S.A.&data=19/11/2020&dataVencimento=21/12/20&f=0'

r = requests.get(url)

soup = BeautifulSoup(r.text, 'lxml')

table = soup.find('div').find_all('table')

print(table)

Output:

[]
[Finished in 3.4s]

When I run this:

from bs4 import BeautifulSoup
import pandas as pd
import requests

url = 'http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/mercado-a-vista/opcoes/posicoes-em-aberto/posicoes-em-aberto-8AE490CA64BA055F0164CCCAE1F1460A.htm?empresaEmissora=AMBEV%20S.A.&data=19/11/2020&dataVencimento=21/12/20&f=0'

r = requests.get(url)

soup = BeautifulSoup(r.text, 'lxml')

table = soup.find('tbody').find_all('tr')

print(table)

I get this, but in the HTML from the page, the table information is in a tbody > tr, just as usual for tables I have scraped before

Traceback (most recent call last):
  File "C:\Users\jvbf9\Documents\data-science\scraping_thiago\main.py", line 11, in <module>
    table = soup.find('tbody').find_all('tr')
AttributeError: 'NoneType' object has no attribute 'find_all'
[Finished in 7.2s with exit code 1]

Upvotes: 1

Views: 400

Answers (1)

Israel Adelaja
Israel Adelaja

Reputation: 36

When you are creating the parser you don't retrieve the text you retrieve the content:

from bs4 import BeautifulSoup
import pandas as pd
import requests

url = 'http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de- 
dados/market-data/consultas/mercado-a-vista/opcoes/posicoes-em- 
aberto/posicoes-em-aberto-8AE490CA64BA055F0164CCCAE1F1460A.htm? 
empresaEmissora=AMBEV%20S.A.&data=19/11/2020&dataVencimento=21/12/20&f=0'

r = requests.get(url)

soup = BeautifulSoup(r.content, 'lxml')

table = soup.find('div').find_all('table')

print(table)

This should be the problem.

Upvotes: 1

Related Questions