havox
havox

Reputation: 4877

Python urlopen return value

I'm trying to pass existing URLs as parameter to load it's HTML in a single txt file:

for line in open('C:\Users\me\Desktop\URLS-HERE.txt'):
 if line.startswith('http') and line.endswith('html\n') :
    fichier = open("C:\Users\me\Desktop\other.txt", "a")
    allhtml = urllib.urlopen(line)
    fichier.write(allhtml)
    fichier.close()

but i get the following error:

TypeError: expected a character buffer object

Upvotes: 2

Views: 6190

Answers (2)

Giova
Giova

Reputation: 2005

The value returned by urllib.urlopen() is a file like object, once you have opened it, you should read it with the read() method, as showed in the following snippet:

for line in open('C:\Users\me\Desktop\URLS-HERE.txt'):
   if line.startswith('http') and line.endswith('html\n') :
      fichier = open("C:\Users\me\Desktop\other.txt", "a")
      allhtml = urllib.urlopen(line)
      fichier.write(allhtml.read())
      fichier.close()

Hope this helps!

Upvotes: 3

Maksim Skurydzin
Maksim Skurydzin

Reputation: 10541

The problem here is that urlopen returns a reference to a file object from which you should retrieve HTML.

for line in open(r"C:\Users\me\Desktop\URLS-HERE.txt"):
 if line.startswith('http') and line.endswith('html\n') :
    fichier = open(r"C:\Users\me\Desktop\other.txt", "a")
    allhtml = urllib2.urlopen(line)
    fichier.write(allhtml.read())
    fichier.close()

Please note that urllib.urlopen function is marked as deprecated since python 2.6. It's recommended to use urllib2.urlopen instead.

Additionally, you have to be careful working with paths in your code. You should either escape each \

"C:\\Users\\me\\Desktop\\other.txt"

or use r prefix before a string. When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change.

r"C:\Users\me\Desktop\other.txt"

Upvotes: 1

Related Questions