Reputation: 39
I'm trying to scrape the two tables from a page
But when I use soup.find('table') it just doesn't find it. Also, when I print the soup object, the table part of the HTML code is not printed, Any solutions?
My code so far:
from bs4 import BeautifulSoup
import pandas as pd
import requests
url = 'http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/mercado-a-vista/opcoes/posicoes-em-aberto/posicoes-em-aberto-8AE490CA64BA055F0164CCCAE1F1460A.htm?empresaEmissora=AMBEV%20S.A.&data=19/11/2020&dataVencimento=21/12/20&f=0'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
table = soup.find('div').find_all('table')
print(table)
Output:
[]
[Finished in 3.4s]
When I run this:
from bs4 import BeautifulSoup
import pandas as pd
import requests
url = 'http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/mercado-a-vista/opcoes/posicoes-em-aberto/posicoes-em-aberto-8AE490CA64BA055F0164CCCAE1F1460A.htm?empresaEmissora=AMBEV%20S.A.&data=19/11/2020&dataVencimento=21/12/20&f=0'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
table = soup.find('tbody').find_all('tr')
print(table)
I get this, but in the HTML from the page, the table information is in a tbody > tr, just as usual for tables I have scraped before
Traceback (most recent call last):
File "C:\Users\jvbf9\Documents\data-science\scraping_thiago\main.py", line 11, in <module>
table = soup.find('tbody').find_all('tr')
AttributeError: 'NoneType' object has no attribute 'find_all'
[Finished in 7.2s with exit code 1]
Upvotes: 1
Views: 400
Reputation: 36
When you are creating the parser you don't retrieve the text you retrieve the content:
from bs4 import BeautifulSoup
import pandas as pd
import requests
url = 'http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-
dados/market-data/consultas/mercado-a-vista/opcoes/posicoes-em-
aberto/posicoes-em-aberto-8AE490CA64BA055F0164CCCAE1F1460A.htm?
empresaEmissora=AMBEV%20S.A.&data=19/11/2020&dataVencimento=21/12/20&f=0'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
table = soup.find('div').find_all('table')
print(table)
This should be the problem.
Upvotes: 1