user3505334
user3505334

Reputation: 19

C++ tokenization

I am writing a lexer in C++ and I am reading from a file character by character, however, how do you do tokenization in this case? I can't use strtok since I have character not a string. Somehow I need to keep reading until I reach a delimeter?

Upvotes: 1

Views: 838

Answers (3)

Arun Sharma
Arun Sharma

Reputation: 517

On the basis of information provided you. If you want to read upto a delimiter from a File, use getline(char *,int,char) function.

getline() is use to read upto n characters or upto a delimiter.

Example:

 #include<fstream.h>
using namespace std;

    main()
    {
        fstream f;

        f.open("test.cpp",ios::in);
        char *c;
        f.getline(c,2,' ');
        cout<<c; // upto 1 char or till a space 

    }

Upvotes: 0

Matthieu M.
Matthieu M.

Reputation: 300349

There are multiple solutions.

The simplest thing to do is exactly that: keep a buffer (std::string) of the characters you already read until you reach a delimiter. At that point, you build a token from the accumulated characters in the buffer, clear the buffer, and push the delimiter (if necessary) in the buffer.

Another solution would be to read ahead of the time: ie, pick up the entire line with std::getline (for example), and then check what's on this line. In general the end-of-line is a natural token delimiter.

This works well... when delimiters are easy.

Unfortunately some languages, like C++, have awkward grammars. For example, in C++ >> can be either:

  • the operator >> (for right-shift and stream extraction)
  • the end of two nested templates (ie could be rewritten as > >)

In those cases... well, just don't bother with the difference in the tokenizer, and let your AST building pass disambiguate, it's got more information.

Upvotes: 0

nvoigt
nvoigt

Reputation: 77364

The answer is Yes. You need to keep reading until you hit a delimiter.

Upvotes: 2

Related Questions