BeautifulSoup not properly parsing script text/template

Question

I have a fairly complex template script that BeautifulSoup4 isn't understanding for some reason. As you can see below, BS4 is only parsing partially into the tree before giving up. Why is this and is there a way to fix it?

>>> from bs4 import BeautifulSoup
>>> html = """ Other stuff I want to stay"""
>>> soup = BeautifulSoup(html)
>>> soup.findAll('script')
[]

Edit: on further testing, for some reason it appears that BS3 is able to parse this correctly:

>>> from BeautifulSoup import BeautifulSoup as bs3
>>> soup = bs3(html)
>>> soup.script

Victor Sigler · Accepted Answer

Beautiful Soup sometimes fail with its default parser. Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers.

In some cases I have to change the parser to other like : lxml, html5lib or any other.

This is a example of the explanation above :

from bs4 import BeautifulSoup    

soup = BeautifulSoup(markup, "lxml")

I recommend you read this http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser

BeautifulSoup not properly parsing script text/template

Answers (1)

Related Questions