Vitor
Vitor

Reputation: 29

requests and Beautifulsoup <tables>

I'm trying to pull just one data from the web table. how could i make this code to pull only one data inside the table?

I'm trying to pull just the value 0.83 how could I do that?

    import requests
    from bs4 import BeautifulSoup
    
    
    url = 'https://www.gov.br/receitafederal/pt-br/assuntos/orientacao-`tributaria/pagamentos-e-parcelamentos/taxa-de-juros-selic#Taxa_de_Juros_Selic'`
    
    headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36"}
    
    
    page = requests.get(url ,headers=headers)
    
    
    #print(page.content)
    #span class = DFlfde SwHCTb
    soup = BeautifulSoup(page.content, "html.parser")
    
    valor_taxa = soup.find_all("table",class_ ="listing" )[0]
    valor_tr = soup.find_all("tr",class_="odd")
    valor_especifico = soup.select('td', class_={'align': 'CENTER'})
    
    
    print(valor_especifico)

The output is:

C:\Users\Francisco\PycharmProjects\INSS\Scripts\python.exe C:/Users/Francisco/PycharmProjects/INSS/web.py
[<td>
<ul>
<li></li>
</ul>
</td>, <td> <strong><a class="anchor-link" href="#Taxa_de_Juros_Selic" target="_self" title="">Taxa de Juros Selic</a></strong></td>, <td>
<ul>
<li></li>
</ul>
</td>, <td><strong><a class="anchor-link" href="#Selicacumulada" target="_self" title=""> </a><a class="anchor-link" href="#Selicmensalmente" target="_self" title="">Taxa de Juros Selic Acumulada Mensalmente</a></strong></td>, <td>
<ul>
<li></li>
</ul>
</td>, <td> <a class="anchor-link" href="#Taxa" target="_self" title=""><strong>Taxa de Juros Selic Incidente sobre as Quotas do Imposto de Renda Pessoa Físic</strong>a</a></td>, <td align="LEFT"><b>Mês/Ano</b></td>, <td align="CENTER"><b>2013</b></td>, <td align="CENTER"><b>2014</b></td>, <td align="CENTER"><b>2015</b></td>, <td align="CENTER"><b>2016</b></td>, <td align="CENTER"><b>2017</b></td>, <td align="CENTER"><b>2018</b></td>, <td align="CENTER"><b>2019</b></td>, <td align="CENTER"><b>2020</b></td>, <td align="CENTER"><b>2021</b></td>, <td align="CENTER"><b>2022</b></td>, <td align="LEFT"><b>Janeiro</b></td>, <td align="CENTER">0,60%</td>, <td align="CENTER">0,85%</td>, <td align="CENTER">0,94%</td>, <td align="CENTER">1,06%</td>, <td align="CENTER">1,09%</td>, <td align="CENTER">0,58%</td>, <td align="CENTER">0,54%</td>, <td align="CENTER">0,38%</td>, <td align="CENTER">0,15%</td>, <td align="CENTER">0,73%</td>, <td align="LEFT"><b>Fevereiro</b></td>, <td align="CENTER">0,49%</td>, <td align="CENTER">0,79%</td>, <td align="CENTER">0,82%</td>, <td align="CENTER">1,00%</td>, <td align="CENTER">0,87%</td>, <td align="CENTER">0,47%</td>, <td align="CENTER">0,49%</td>, <td align="CENTER">0,29%</td>, <td align="CENTER">0,13%</td>, <td align="CENTER">0,76%</td>, <td align="LEFT"><b>Março</b></td>, <td align="CENTER">0,55%</td>, <td align="CENTER">0,77%</td>, <td align="CENTER">1,04%</td>, <td align="CENTER">1,16%</td>, <td align="CENTER">1,05%</td>, <td align="CENTER">0,53%</td>, <td align="CENTER">0,47%</td>, <td align="CENTER">0,34%</td>, <td align="CENTER">0,20%</td>, <td align="CENTER">0,93%</td>, <td align="LEFT"><b>Abril</b></td>, <td align="CENTER">0,61%</td>, <td align="CENTER">0,82%</td>, <td align="CENTER">0,95%</td>, <td align="CENTER">1,06%</td>, <td align="CENTER">0,79%</td>, <td align="CENTER">0,52%</td>, <td align="CENTER">0,52%</td>, <td align="CENTER">0,28%</td>, <td align="CENTER">0,21%</td>, <td align="CENTER">0,83%</td>, <td align="LEFT"><b>Maio</b></td>, <td align="CENTER">0,60%</td>, <td align="CENTER">0,87%</td>, <td align="CENTER">0,99%</td>, <td align="CENTER">1,11%</td>, <td align="CENTER">0,93%</td>, <td align="CENTER">0,52%</td>, <td align="CENTER">0,54%</td>, <td align="CENTER">0,24%</td>, <td align="CENTER">0,27%</td>, <td align="CENTER"></td>, <td align="LEFT"><b>Junho</b></td>, <td align="CENTER">0,61%</td>, <td align="CENTER">0,82%</td>, <td align="CENTER">1,07%</td>, <td align="CENTER">1,16%</td>, <td align="CENTER">0,81%</td>, <td align="CENTER">0,52%</td>, <td align="CENTER">0,47%</td>, <td align="CENTER">0,21%</td>, <td align="CENTER">0,31%</td>, <td align="CENTER"></td>, <td align="LEFT"><b>Julho</b></td>, <td align="CENTER">0,72%</td>, <td align="CENTER">0,95%</td>, <td align="CENTER">1,18%</td>, <td align="CENTER">1,11%</td>, <td align="CENTER">0,80%</td>, <td align="CENTER">0,54%</td>

Process finished with exit code 0

Upvotes: 1

Views: 71

Answers (2)

Adon Bilivit
Adon Bilivit

Reputation: 26976

There are more succinct ways to do this but by breaking down into individual steps may make it clearer.

Do a GET on the URL and check the HTTP status.

Build 'soup' from the response text.

Iterate over each table, tr and td finally printing all the text associated with the lower level tds.

import requests
from bs4 import BeautifulSoup as BS

(r := requests.get('https://www.gov.br/receitafederal/pt-br/assuntos/orientacao-tributaria/pagamentos-e-parcelamentos/taxa-de-juros-selic#Taxa_de_Juros_Selic')).raise_for_status()

soup = BS(r.text, 'lxml')

for table in soup.find_all('table', {'class': 'listing'}):
    for tr in table.find_all('tr', {'class': 'odd'}):
        for td in tr.find_all('td', {'align': 'CENTER'}):
            print(td.text)

Upvotes: 1

dkantor
dkantor

Reputation: 162

Try the selenium package:

from selenium.webdriver.common.by import By
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

url = 'https://www.gov.br/receitafederal/pt-br/assuntos/orientacao-`tributaria/pagamentos-e-parcelamentos/taxa-de-juros-selic#Taxa_de_Juros_Selic'
    
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)
elementsxpath = #!!!!! define the xpath of the element
driver.find_element(By.XPATH, elementsxpath).text

Upvotes: 0

Related Questions