Reputation: 1481

How can I read a file in chunks based on custom delimiter string?

I have a large file looks like this:

<doc>
Hello
</doc>

<doc>
World
</doc>

// Some more repeating blocks

I want to read this file in chunks, and read only a section of it at a time, delimiting with </doc>. So I am expecting getting these in the first chunks:

<doc>
Hello
</doc>

And these in the second chunks

<doc>
World
</doc>

I have tried reading the doc related BufRead, but then read_until can only accept a byte as delimiter. How can I achieve something like this?

Upvotes: 0

Answers (1)

at54321

Reputation: 11856

I don't think there is an ideal one-size-fits-all solution to this problem. If you don't want to use a standard XML parser, here are some tips...

If you expect your chunk to always end with </doc> in a separate line, you can simply read via read_line() and check each line (maybe also trim() it before comparing it). If </doc> is not necessarily alone in a line, you can still use read_line() but with some extra code that parses the line. Those ideas, however, might not be suitable if you expect your XML to have gigantic lines.

Also, if you expect (and want to add support for) nested <doc> tags, that might complicate your manual parsing additionally.

Upvotes: 1

How can I read a file in chunks based on custom delimiter string?

Answers (1)

Related Questions