Reputation: 4877
I'm trying to pass existing URLs as parameter to load it's HTML in a single txt
file:
for line in open('C:\Users\me\Desktop\URLS-HERE.txt'):
if line.startswith('http') and line.endswith('html\n') :
fichier = open("C:\Users\me\Desktop\other.txt", "a")
allhtml = urllib.urlopen(line)
fichier.write(allhtml)
fichier.close()
but i get the following error:
TypeError: expected a character buffer object
Upvotes: 2
Views: 6190
Reputation: 2005
The value returned by urllib.urlopen()
is a file like object, once you have opened it, you should read it with the read()
method, as showed in the following snippet:
for line in open('C:\Users\me\Desktop\URLS-HERE.txt'):
if line.startswith('http') and line.endswith('html\n') :
fichier = open("C:\Users\me\Desktop\other.txt", "a")
allhtml = urllib.urlopen(line)
fichier.write(allhtml.read())
fichier.close()
Hope this helps!
Upvotes: 3
Reputation: 10541
The problem here is that urlopen
returns a reference to a file object from which you should retrieve HTML.
for line in open(r"C:\Users\me\Desktop\URLS-HERE.txt"):
if line.startswith('http') and line.endswith('html\n') :
fichier = open(r"C:\Users\me\Desktop\other.txt", "a")
allhtml = urllib2.urlopen(line)
fichier.write(allhtml.read())
fichier.close()
Please note that urllib.urlopen
function is marked as deprecated since python 2.6. It's recommended to use urllib2.urlopen
instead.
Additionally, you have to be careful working with paths in your code. You should either escape each \
"C:\\Users\\me\\Desktop\\other.txt"
or use r
prefix before a string. When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change.
r"C:\Users\me\Desktop\other.txt"
Upvotes: 1