How to iteratively update an xml file that won't fit into memory?

Question

I have a 10GB xml file that is parsed from the en-wikipedia-articles-pages-latest.xml file. My 10GB xml file contains xml elements that have the word "football" somewhere in them (in the text). Now my goal is to create a new output xml file that only contains player names and their corresponding teams throughout the years. Let's say I come across a Lionel Messi page, I parse the infobox which contains the information I need and lastly write it to an xml file. The problem is I can come across an unknown footballer, or a page about a footballer that has an old / broken infobox. Then I come across a football team that contains information about this unknown footballer with a broken infobox. The data in the new output xml is already written, but should be overwritten by this new information. My problem is that I can't keep the new output xml as an object in memory, because it's too large. Then again, I don't want to sequentially scan the new output xml file and try to look for a concrete entry. My question is whether there exists a general approach on how to handle this kind of situation.

How to iteratively update an xml file that won't fit into memory?

Answers (1)

Related Questions

How to iteratively update an xml file that won&#39;t fit into memory?

Answers (1)

Related Questions

How to iteratively update an xml file that won't fit into memory?