parse xml with many roots using BeautifulSoup

Question

I am trying to parse a large xml file downloaded from Google using BS4. However, the file is constructed with many roots so that the xml parser can only parse in the first block.

I load the file using the following command

xml = BeautifulSoup("test.xml", "xml")

The test.xml file looks like below, it has many roots:




A LOT of information





A LOT of information


.......

The html parser can read in the full file. However, a regular such file contains over 10k roots. Reading using html parser is slow and eats all my memory. Is there a way to get around this problem?

Any help is appreciated.

parse xml with many roots using BeautifulSoup

Answers (1)

Related Questions