I have a huge (100k+ lines, 5MB+) XML which acts as a database for my C++ Application. The structure of the XML is quite straight forward, for example, it has chunks of: <foo> <bar prop="true"/> <baz>blah</baz> </foo> The nesting of tags is several levels deep and there are many items with multiple properties. What is a good way to find and replace chunks of this kind of a file? For example, assume that the above section is repeated a few dozen times and in each chunk the value of the tag <baz> is different. I'd like to make edits such as: Setting all the values contained in tag <baz> to a given value. Remove chunks containing certain values Etc. So far, I've learnt of the following methods for accomplishing this: Find/Replace : A no-brainer, trivial solution and also my last fall-back. This approach, IMHO is the most time consuming, error prone and painful method. The absolute last resort. RegExes : Use regular expressions to match blocks of interest and edit them using replacement expressions. Kinda like this blog entry: http://blogs.msdn.com/b/vseditor/archive/2004/08/12/213770.aspx . But I feel this would be error prone and there could be a bunch of missed items if the regex is not exactly right the first time around. Parser & Save : Whip up a quick program to parse the XML using Xerces or XML DOM Interfaces (or some other XML library), read the XML in, manipulate it as desired and save back to disk. Again, this approach is a slow process, but once its up and running, easy to make modifications and more flexible then RegExes. Are there any better ways to deal with this? (EDIT: Thanks for all the redo it to use a DB suggestions, I know its a huge mess but by "better ways to deal with this" I meant the "find/replace" part. )

Reputation: 10209

Good Option for XML Edit/Replace

I have a huge (100k+ lines, 5MB+) XML which acts as a database for my C++ Application. The structure of the XML is quite straight forward, for example, it has chunks of:

<foo>
<bar prop="true"/>
<baz>blah</baz>
</foo>

The nesting of tags is several levels deep and there are many items with multiple properties. What is a good way to find and replace chunks of this kind of a file? For example, assume that the above section is repeated a few dozen times and in each chunk the value of the tag <baz> is different. I'd like to make edits such as:

Setting all the values contained in tag <baz> to a given value.
Remove chunks containing certain values
Etc.

So far, I've learnt of the following methods for accomplishing this:

Find/Replace: A no-brainer, trivial solution and also my last fall-back. This approach, IMHO is the most time consuming, error prone and painful method. The absolute last resort.
RegExes: Use regular expressions to match blocks of interest and edit them using replacement expressions. Kinda like this blog entry: http://blogs.msdn.com/b/vseditor/archive/2004/08/12/213770.aspx. But I feel this would be error prone and there could be a bunch of missed items if the regex is not exactly right the first time around.
Parser & Save: Whip up a quick program to parse the XML using Xerces or XML DOM Interfaces (or some other XML library), read the XML in, manipulate it as desired and save back to disk. Again, this approach is a slow process, but once its up and running, easy to make modifications and more flexible then RegExes.

Are there any better ways to deal with this? (EDIT: Thanks for all the redo it to use a DB suggestions, I know its a huge mess but by "better ways to deal with this" I meant the "find/replace" part. )

Upvotes: 3

Answers (3)

Steve Townsend

Reputation: 54178

What are your actual memory constraints? 5MB is large but not enormous by current RAM standards.

I would use DOM with XPath if you can, it will be a lot less development work than SAX or other stream-based parsing. My problem with SAX is that if you are really using this as a in-memory DB, that implies random access on-demand and SAX is not well-suited for that - you will have to parse and reserialize over and over, whereas once you have the DOM at least you can play with it as you like.

Echo comments about to store in-RAM database info too. Plenty of alternatives that are better suited to this than XML. Maybe you could implement a tactical solution using DOM/XPath and investigate rip-and-replace as a longer-term project.

Upvotes: 0

user7116

Reputation: 64138

Are there any better ways to deal with this?

If you must use XML, you could use an XML database such as BDB XML (which has C++ APIs). It supports XQuery, transactions, etc.

Other options include TinyXML which I've used with success in the past. Quick and easy to use, not necessarily the fastest on a file of that size, but it will get the job done.

Upvotes: 1

Charles Brunet

Reputation: 23160

If you don't want to put the entire document in memory, I would read it using a SAX parser. As you read it, you append the transformed document to a second (or a temp) file. I think it could be pretty fast, and use only a little memory footprint.

Upvotes: 2

Good Option for XML Edit/Replace

Answers (3)

Related Questions