Reputation: 329
I am parsing a html document using beautiful soup in python.
I came across a tag like this
div class="_3auQ3N">\u20b9<!-- -->1,990</div>
\u20bp represents currency symbol and 1,990 is the price.
I want to know how can I extract these values into two different Strings (or values)?
Upvotes: 0
Views: 110
Reputation: 3118
>>> soup = BeautifulSoup('<div class="_3auQ3N">\u20b9<!-- -->1,990</div>', 'lxml')
>>> list(soup.div.strings)
['₹', '1,990']
Upvotes: 4
Reputation: 2945
Once you have extracted your string, you may yse regex:
import re
string = "\u20b9<!-- -->1,990"
a = re.findall("(^.*)<!-- -->(.*)", string)
print(a[0][0],a[0][1]) # ₹ 1,990
Upvotes: 0