P A N
P A N

Reputation: 5922

Python Selenium: Remove certain characters from web page body

I'm running Selenium with Firefox in Python, and I'm trying to match Elements on a page from keywords in a list.

For the element lookup to be successful, I need to get rid of some special characters like ® and ™ on the web page. I am unfortunately not able to predict when such characters are employed, and I therefore can't add them on the "keyword end" of the problem.

I don't think that Selenium or Firefox itself can remove unwanted characters from a webpage, but my thought was to have Selenium execute a JavaScript on the page and remove those characters. Is that possible?

Something like this presumably non-working, pseudo-code:

driver.execute_script("document.body.innerHTML.replace(/®/g, '');")

The replacement should happen before the driver tries to "read" the page and find_element.

FYI the characters I want to get rid of are in <a> text() nodes in <td> cells across the document body.

Upvotes: 1

Views: 1342

Answers (1)

Mateusz Ostaszewski
Mateusz Ostaszewski

Reputation: 329

ASCII is in range of 0 to 127, so you can do it this way:

document.body.innerHTML.replace(/[^\x00-\x7F]/g, '');

If you want to remove only ® you can do it this way:

document.body.innerHTML.replace(/(®)/, '');

Upvotes: 2

Related Questions