Zack
Zack

Reputation: 517

BeautifulSoup HTMLParseError

New to Python, have a simple, situational question:

Trying to use BeautifulSoup to parse a series of pages.

from bs4 import BeautifulSoup
import urllib.request

BeautifulSoup(urllib.request.urlopen('http://bit.ly/'))

Traceback ...

html.parser.HTMLParseError: expected name token at '<!=KN\x01...

Working on Windows 7 64-bit with Python 3.2.

Do I need Mechanize? (which would entail Python 2.X)

Upvotes: 5

Views: 3228

Answers (4)

Jcc.Sanabria
Jcc.Sanabria

Reputation: 691

instead of urllib.request i suggest use requests, and from this lib use get()

from requests import get
from bs4 import BeautifulSoup

soup = BeautifulSoup(
       get(url="http://www.google.com").content, 
       'html.parser'
)

Upvotes: 1

sumit
sumit

Reputation: 3740

if you want to download a file in python you can use this as well

import urllib
urllib.urlretrieve("http://bit.ly/xg7enD","myfile.mp3")

and it will save your file in the current working directory with "myfile.mp3" name. i am able to download all types of files through it.

hope it may help !

Upvotes: 0

ChicoBird
ChicoBird

Reputation: 501

If you were trying to download that MP3, you could do something like this:

import urllib2

BLOCK_SIZE = 16 * 1024

req = urllib2.urlopen("http://bit.ly/xg7enD") 
#Make sure to write as a binary file
fp = open("someMP3.mp3", 'wb')
try:
  while True:
    data = req.read(BLOCK_SIZE)
    if not data: break
    fp.write(data)
finally:
  fp.close()

Upvotes: 4

kindall
kindall

Reputation: 184231

If that URL is correct, you're asking why an HTML parser throws an error parsing an MP3 file. I believe the answer to this to be self-evident...

Upvotes: 26

Related Questions