Reputation: 2613
I have a data crawler (BeautifulSoup) running which returns the following strings assigned to a variable priceLast
:
<td>
200,90<br/>
196,90 </td>
<td>
20,90<br/>
16,90 </td>
<td>
2,90<br/>
1,90 </td>
The spaces varies from time to time, hence I would like to assign the chars between <td>XXXX<br/>
to var price1
and those right after <br/>
until the first space to var price2
.
I tried .split
to approach a solution
priceLast.split("<br/>")
but this throws:
TypeError: 'NoneType' object is not callable
Upvotes: 0
Views: 1364
Reputation: 89
Maybe priceList
is not an object of string type?
I have tried it below and the .split()
should work.
>>>string = "<td>\
... 200,90<br/>\
... 196,90 </td>"
>>>new = string.split('<br/>')
>>>new
['<td>200,90', '196,90 </td>']
Upvotes: 0
Reputation: 301
You can use regex to get the numbers:
m = re.findall('\d+,\d+', str)
This will return the list of prices in the format 0,0(digits seperated with comma)
For example:
import re
str='<td> \
200,90<br/>\
196,90 </td>'
m = re.findall('\d+,\d+', str)
print(m)
output:
['200,90', '196,90']
More information on regex: https://docs.python.org/3/library/re.html#module-re
Upvotes: 1
Reputation: 8077
You can get the text from the td
tag, and split at the new line:
from bs4 import BeautifulSoup
h = """
<td>
200,90<br/>
196,90 </td>
"""
soup = BeautifulSoup(h, "html.parser")
prices = soup.find("td").text.strip().split("\n")
print(prices[0], prices[1])
#200,90 196,90
Upvotes: 5