user2070615
user2070615

Reputation: 53

Urllib html not showing

When I use the Urllib module, I can call/print/search the html of a website the first time, but when I try again it is gone. How can I keep the html throughout the program.

For example, when I try:


html = urllib.request.urlopen('http://www.bing.com/search?q=Mike&go=&qs=n&form=QBLH&filt=all&pq=mike&sc=8-2&sp=-1&sk=')
search = re.findall(r'Mike',str(html.read()))

search

I get:

['Mike','Mike','Mike','Mike']


But then when I try to do this a second time like so:

results = re.findall(r'Mike',str(html.read()))

I get:

[]

when calling 'result'.

Why is this and how can I stop it from happening/fix it?

Upvotes: 0

Views: 107

Answers (2)

Mark Tolonen
Mark Tolonen

Reputation: 178409

In addition to the correct guess of @rvalik that you can only read a stream once, data = str(html.read()) is incorrect. urlopen returns a bytes object and str returns the display representation of that object. An example:

>>> data = b'Mike'
>>> str(data)
"b'Mike'"

What you should do is either decode the bytes object using the encoding of the HTML page (UTF-8 in this case):

from urllib.request import urlopen
import re

with urlopen('http://www.bing.com/search?q=Mike&go=&qs=n&form=QBLH&filt=all&pq=mike&sc=8-2&sp=-1&sk=') as html:
    data = html.read().decode('utf8')

print(re.findall(r'Mike',data))

or search with a bytes object:

from urllib.request import urlopen
import re

with urlopen('http://www.bing.com/search?q=Mike&go=&qs=n&form=QBLH&filt=all&pq=mike&sc=8-2&sp=-1&sk=') as html:
    data = html.read()

print(re.findall(rb'Mike',data))

Upvotes: 1

rvalvik
rvalvik

Reputation: 1559

Without being very well versed in python, I'm guessing html.read() reads the http stream, so when you call it the second time there is nothing to read.

Try:

html = urllib.request.urlopen('http://www.bing.com/search?q=Mike&go=&qs=n&form=QBLH&filt=all&pq=mike&sc=8-2&sp=-1&sk=')
data = str(html.read())
search = re.findall(r'Mike',data)
search

And then use

results = re.findall(r'Mike',data)

Upvotes: 2

Related Questions