ZhaoGang
ZhaoGang

Reputation: 4915

Python Beautifulsoup : file.write(str) method get TypeError: write() argument must be str, not BeautifulSoup

I wrote below codes:

from bs4 import BeautifulSoup
import sys # where is the sys module in the source code folder ?

try:
    import urllib.request as urllib2 
except ImportError:
    import urllib2


print (sys.argv) # 
print(type(sys.argv)) # 

#baseUrl = "https://ecurep.mainz.de.xxx.com/ae5/"
baseUrl = "http://www.bing.com"
baseUrl = "http://www.sohu.com/"
print(baseUrl)

url = baseUrl 
page = urllib2.urlopen(url) #urlopen is a function, function is also an object
soup = BeautifulSoup(page.read(), "html.parser") #NameError: name 'BeautifulSoup' is not defined

html_file = open("Output.html", "w")
soup_string = str(soup)
print(type(soup_string))
html_file.write(soup_string) # TypeError: write() argument must be str, not BeautifulSoup
html_file.close()

which compiler give below errors:

C:\hzg>py Py_logDownload2.py 1
['Py_logDownload2.py', '1']
<class 'list'>
http://www.sohu.com/
<class 'str'>
Traceback (most recent call last):
  File "Py_logDownload2.py", line 25, in <module>
    html_file.write(soup_string) # TypeError: write() argument must be str, not
BeautifulSoup
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python35\lib\encodings\
cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 376-377:
 character maps to <undefined>

but soup_string is obviously a str, so why the first error given by compiler ? also don't know why the second one appear.

what is more confusion, if I change code to:

baseUrl = "https://ecurep.mainz.de.xxx.com/ae5/"
#baseUrl = "http://www.bing.com"
#baseUrl = "http://www.sohu.com/"

it will compile and have no error. (I have change the original name to "xxx").

Can anyone help to debug this?

Update :

I used to write Java codes and am a newbie for Python. With the help of you, I think I have made some progress with debugging Python: when debugging Java, always try to solve the first error; while in Python, ayways try to solve the last one.

Upvotes: 1

Views: 1633

Answers (1)

gdlmx
gdlmx

Reputation: 6789

Try to open your file with utf-8

import codecs
f = codecs.open("test", "w", "utf-8")

You can ignore the coding error (not recommended) by

f = codecs.open("test", "w", "utf-8", errors='ignore')

Upvotes: 3

Related Questions