Exp
Exp

Reputation: 206

Routines for parsing a text file in C++

I would like to parse a text file using C++. I know the syntax of the file and from the computer science point of view I dont think that I have any problems. However, I dont know exactly how to implement the parser in C++. I think there are a number of possibilities:

  1. flex/yacc: I think that the toolchain is a little outdated and I dont think that it would work very well with the rest of my program.

  2. plain C: I could read the entire file into one char array and use pointers for random access. The problem is that the text files might be huge and I really wouldnt want to store them in memory the whole time.

  3. C++ istreams: I think the problem here is that in the process of parsing the file I of couse need some kind of lookahead. If an expression doesn't match then I would of course have to put the chars that I read so far back into the stream. I think that this would become rather ugly using the ungetch function in C++. Also, since the expressions might be rather long, the peek function is probably inadequate for me.

  4. Using boost: Boost supplies regular expressions which would be perfect to recognize tokens, but as far as my research goes, it is not possible to match regular expressions and consume the tokens within the context of an istream.

I also used javacc with java a while back and I have to say that I was very impressed with it. However I don't think that there is anything like this in C++, is there?

I would really appreciate it if anyone with some experience in the area could point me in the right direction.

Upvotes: 2

Views: 1096

Answers (2)

You might also consider ANTLR as a parser generator.

Upvotes: 0

Joe McGrath
Joe McGrath

Reputation: 1501

If this is true:

plain C: I could read the entire file into one char array and use pointers for random access. The problem is that the text files might be huge and I really wouldnt want to store them in memory the whole time.

You should look into memory mapped files.

Iczelion has a good tutorial on the Windows API for memory mapped files here.

POSIX provides mmap(). Beej is apprently back online at a new address and provides an example of use here.

Boost also provides a single way to use the above in a platform independent way. I don't know much about it because i would rather write something like this myself. I am sure it has it's advantages. Boost has a page about it here.

Stack Overflow has a question about parsing a mmap()ed file here.

Upvotes: 1

Related Questions