JSRB
JSRB

Reputation: 2613

How to get string before and after <br> tag using Python

I have a data crawler (BeautifulSoup) running which returns the following strings assigned to a variable priceLast:

<td>
200,90<br/>
196,90                          </td>
<td>
20,90<br/>
16,90                           </td>
<td>
2,90<br/>
1,90                            </td>

The spaces varies from time to time, hence I would like to assign the chars between <td>XXXX<br/> to var price1 and those right after <br/> until the first space to var price2.

I tried .split to approach a solution

priceLast.split("<br/>")

but this throws:

TypeError: 'NoneType' object is not callable

Upvotes: 0

Views: 1364

Answers (3)

Nico_Robin
Nico_Robin

Reputation: 89

Maybe priceList is not an object of string type?

I have tried it below and the .split() should work.

>>>string = "<td>\
... 200,90<br/>\
... 196,90                          </td>"
>>>new = string.split('<br/>')
>>>new
['<td>200,90', '196,90                          </td>']

Upvotes: 0

Aysu Sayın
Aysu Sayın

Reputation: 301

You can use regex to get the numbers:

m = re.findall('\d+,\d+', str)

This will return the list of prices in the format 0,0(digits seperated with comma)

For example:

import re

str='<td> \
200,90<br/>\
196,90                          </td>'

m = re.findall('\d+,\d+', str)
print(m)

output:

['200,90', '196,90']

More information on regex: https://docs.python.org/3/library/re.html#module-re

Upvotes: 1

drec4s
drec4s

Reputation: 8077

You can get the text from the td tag, and split at the new line:

from bs4 import BeautifulSoup

h = """
<td>
200,90<br/>
196,90                          </td>
"""

soup = BeautifulSoup(h, "html.parser")
prices = soup.find("td").text.strip().split("\n")
print(prices[0], prices[1])
#200,90 196,90

Upvotes: 5

Related Questions