Convert string containing HTML to actual HTML

Question

Set-up

I have various string variable containing HTML, for one of them https://pastebin.com/rsi3v9nh.

I need to obtain the text inside the HTML. E.g. from the following HTML snippet,



50.000 r.p.m.
Dry technique
Controllable by foot pedal
Auto-Cruise
Twist-lock system
100W drill power
7.8 Ncm torque
220V-240V
12-months warranty


[/vc_column_text]

I'd like to obtain the text of all

s.

Note that this is just an example of a part of the entire string, the texts are not only in

elements.

Problem

Simply using regex will be quite cumbersome, because the patterns are a bit irregular.

I'm familiar with Selenium to obtain data from HTML, i.e. to do driver.find_element_by_xpath('div') etc. But this works only on HTML objects, not strings.

I was wondering if I can somehow convert the string into HTML and then obtain the texts in a Selenium-like manner.

Any other solution would be ok as well.

user3483203 · Accepted Answer

You definitely don't want to use regular expressions here.

You can use beautifulsoup to parse this instead:

from bs4 import BeautifulSoup

s = '

50.000 r.p.m.
Dry technique
Controllable by foot pedal
Auto-Cruise
Twist-lock system
100W drill power
7.8 Ncm torque
220V-240V
12-months warranty


[/vc_column_text]'

soup = BeautifulSoup(s)
print(soup.findAll(text=True))

Output:

['
', '
', '50.000 r.p.m.', '
', 'Dry technique', '
', 'Controllable by foot pedal', '
', 'Auto-Cruise', '
', 'Twist-lock system', '
', '100W drill power', '
', '7.8 Ncm torque', '
', '220V-240V', '
', '12-months warranty', '
', '
', '
', '[/vc_column_text]']

Convert string containing HTML to actual HTML

Answers (1)

Related Questions