Reputation: 125
import urllib
data = urllib.urlopen("https://www.python.org/")
for line in data:
line.strip()
print line
I am trying to make a web crawler but when I run the above code ,some HTML stuff also gets printed .I only want the text portion of the web page and the hyperlinks
Upvotes: 1
Views: 53
Reputation: 244
Use beautiful soup library for making a web crawler and handling HTML tags.
Upvotes: 1
Reputation: 273
A somewhat rudimentary solution would be to .split over "<" and ">" tags and then just check the resulting list to remove elements starting at any "<" and ending at the next ">".
Upvotes: 1