DanielTheRocketMan
DanielTheRocketMan

Reputation: 3249

BeautifulSoup: Getting empty variables

I have been trying to get the value of some variables of a web page:

itemPage='https://dadosabertos.camara.leg.br/api/v2/legislaturas/1' 
url = urlopen(itemPage)
soupItem=BeautifulSoup(url,'lxml')
dataInicio=soupItem.find('dataInicio')
dataFim=soupItem.find('dataFim')            

However, dataInicio and dataFim are empty. What am I doing wrong?

Upvotes: 1

Views: 210

Answers (1)

ggorlen
ggorlen

Reputation: 56993

There are a couple of issues here. First, soup expects a string as input; check your url and see that it's actually <http.client.HTTPResponse object at 0x036D7770>. You can read() it, which produces a JSON byte string which is usable. But if you'd prefer to stick with XML parsing, I'd recommend using Python's request library to obtain a raw XML string (pass in correct headers to specify XML).

Secondly, when you create your soup object, you need to pass in features="xml" instead of "lxml".

Putting it all together:

import requests
from bs4 import BeautifulSoup

item_page = "https://dadosabertos.camara.leg.br/api/v2/legislaturas/1"
response = requests.get(item_page, headers={"accept": "application/xml"})
soup = BeautifulSoup(response.text, "xml")

data_inicio = soup.find("dataInicio")
data_fim = soup.find("dataFim")
print(data_inicio)
print(data_fim)

Output:

<dataInicio>1826-04-29</dataInicio>
<dataFim>1830-04-24</dataFim>

Upvotes: 2

Related Questions