BeautifulSoup cannot parse the html tags which don't have closing element

Question

Here is the HTML code I working on it




sdasdsadsad

I want to get the line contains tag "" , which doesn't have close element . There is my code



import glob, os, re, urllib2, codecs
from bs4 import BeautifulSoup
from bs4 import SoupStrainer


html_doc = """



sdasdsadsad










"""



soup = BeautifulSoup(html_doc)
aa = soup.find("meta", {"name":"description"})
print aa.encode("utf-8")


Running the Python code, but the console show 









But if "" has close element , I can get exactly the line:

 


Would you like to tell me why the reason BeautifulSoup get all HTML tag under  , and how to get the line contains 


Thanks.

PepperoniPizza · Accepted Answer

Use the lxml module as the parser and it will work, I've tested it.

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_doc, 'lxml')
aa = soup.find("meta", {"name":"description"})

print aa.encode('utf-8')

# console output

BeautifulSoup cannot parse the html tags which don't have closing element

Answers (1)

Related Questions

BeautifulSoup cannot parse the html tags which don&#39;t have closing element

Answers (1)

Related Questions

BeautifulSoup cannot parse the html tags which don't have closing element