Reputation: 517
New to Python, have a simple, situational question:
Trying to use BeautifulSoup to parse a series of pages.
from bs4 import BeautifulSoup
import urllib.request
BeautifulSoup(urllib.request.urlopen('http://bit.ly/'))
Traceback ...
html.parser.HTMLParseError: expected name token at '<!=KN\x01...
Working on Windows 7 64-bit with Python 3.2.
Do I need Mechanize? (which would entail Python 2.X)
Upvotes: 5
Views: 3228
Reputation: 691
instead of urllib.request i suggest use requests, and from this lib use get()
from requests import get
from bs4 import BeautifulSoup
soup = BeautifulSoup(
get(url="http://www.google.com").content,
'html.parser'
)
Upvotes: 1
Reputation: 3740
if you want to download a file in python you can use this as well
import urllib
urllib.urlretrieve("http://bit.ly/xg7enD","myfile.mp3")
and it will save your file in the current working directory with "myfile.mp3" name. i am able to download all types of files through it.
hope it may help !
Upvotes: 0
Reputation: 501
If you were trying to download that MP3, you could do something like this:
import urllib2
BLOCK_SIZE = 16 * 1024
req = urllib2.urlopen("http://bit.ly/xg7enD")
#Make sure to write as a binary file
fp = open("someMP3.mp3", 'wb')
try:
while True:
data = req.read(BLOCK_SIZE)
if not data: break
fp.write(data)
finally:
fp.close()
Upvotes: 4
Reputation: 184231
If that URL is correct, you're asking why an HTML parser throws an error parsing an MP3 file. I believe the answer to this to be self-evident...
Upvotes: 26