BeautifulSoup parser adds unnecessary closing html tags

Question

For example

you have html like

python:

from bs4 import BeautifulSoup as bs
import urllib3

URL = 'html file'

http = urllib3.PoolManager()

page = http.request('GET', URL)
soup = bs(page.data, 'html.parser')

print(soup.prettify())

And if you parse it using BeautifulSoup in python and print it with prettify it will give output like this

output:

but if you have html meta tag like

It will give output as it is. It won't add an ending tag

so how to stop BeautifulSoup from adding unnecessary ending tags?

Nihal · Accepted Answer

To solve this you just need to change your html parser to lxml parser

then you python script will be

from bs4 import BeautifulSoup as bs
import urllib3

URL = 'html file'

http = urllib3.PoolManager()

page = http.request('GET', URL)
soup = bs(page.data, 'lxml')

print(soup.prettify())

you just need to change soup = bs(page.data, 'html.parser') to soup = bs(page.data, 'lxml')

BeautifulSoup parser adds unnecessary closing html tags

Answers (1)

Related Questions