oskar333
oskar333

Reputation: 167

Beautifulsoup4 Python extracting data

I am trying to extract the address from this site, and the html look like this:

<div class="col-xs-12 col-sm-6 col-address">
<div>ul. Małachowskiego 45<br />42-500 Będzin<br />woj. śląskie</div>
</div>

So far I use

soup = BeautifulSoup(firma, "lxml")
address = soup.find("div", class_="col-address")
if address:
    address_firmy = (address.text)

And I get: "ul. Małachowskiego 4542-500 Będzinwoj. śląskie"

So now two questions:

  1. how to I put spaces where originally the br tag was?
  2. how can split the string into separate fields (in csv): street, postcode, town, area?

It is probably very simple but I am totally new to programming and Python... ;)

Upvotes: 0

Views: 33

Answers (1)

宏杰李
宏杰李

Reputation: 12158

In [56]: soup.div.get_text(separator=',', strip=True)
Out[56]: 'ul. Małachowskiego 45,42-500 Będzin,woj. śląskie'
  • You can specify a string to be used to join the bits of text together using separator

  • You can tell Beautiful Soup to strip whitespace from the beginning and end of each bit of text using strip=True

Upvotes: 1

Related Questions