Rajeev Srivastava
Rajeev Srivastava

Reputation: 169

How to parse big XML file using beautiful soup?

I am trying to parse an XML file named document.xml which contains around 400000 character (including tags, breakline, space) init find the code below

document_xml_file_object = open('document.xml', 'r')
document_xml_file_content = document_xml_file_object.read()

xml_content = BeautifulSoup(document_xml_file_content, 'lxml-xml')
print("XML CONTENT: ", xml_content)

when I am printing xml_content below is my output:

XML CONTENT:  <?xml version="1.0" encoding="utf-8"?>

For the smaller size of files its printing complete XML code. can anyone help me with this why its happening.

Edit : Click Here to see my XML Content.

Thanks in Advance

Upvotes: 0

Views: 561

Answers (1)

shantanoo
shantanoo

Reputation: 3704

For large files it better to use line parser like xml.sax. beautifulsoup will load the whole file in memory and parse, while using xml.sax you will use quite less memory.

Upvotes: 1

Related Questions