Ungifted
Ungifted

Reputation: 5

How to replace all strings in python?

I am creating a proxy scraper using reguler experessions. Html parsing with re is terrible, so I need to make sure no strings show up in the end result. How can I replace all strings with a space. The current code I had to clean up the parsed data was

print title.replace(',', '').replace("!", '').replace(":", '').replace(";", '').replace(str, '') 

The str portion was what I tried.... it did not work. Any other methods?

Upvotes: 0

Views: 398

Answers (2)

p99will
p99will

Reputation: 250

replace1 = range(0,46)+range(58,127)+[47] #Makes a list of all the 
#ASCII characters  values that you don't want it to show,
#http://www.asciitable.com/, this includes all the letters,
#and excludes all numbers and '.'

text = '<html><body><p>127.0.0.1</p></body></html>' #Test data.
tmp = '' 

for i in range(len(text)-1): #this goes through each character in the text
...     if not ord(text[i]) in replace1: #checks if that character's 
#ASCII value is in not the list of 'Blacklisted' ASCII values, 
#then appends it to the tmp variable 
...             tmp += text[i]

print tmp
127.0.0.1

Upvotes: 1

poke
poke

Reputation: 387775

If you want to extract all visible numbers from the HTML document, you can first use BeautifulSoup to parse the HTML document, and extract the text from it. And after that, you can extract all the numbers from those text elements:

from bs4 import BeautifulSoup
from urllib.request import urlopen
import re

# let’s use the StackOverflow homepage as an example
r = urlopen('http://stackoverflow.com')
soup = BeautifulSoup(r)

# As we don’t want to get the content from script related
# elements, remove those.
for script in soup(['script', 'noscript']):
    script.extract()

# And now extract the numbers using regular expressions from
# all text nodes we can find in the (remaining) document.
numbers = [n for t in soup(text=True) for n in re.findall('\d+', t)]

numbers will then contain all the numbers that were visible in the document. If you want to restrict the search to only certain elements, you can change the soup(text=True) part.

Upvotes: 3

Related Questions