Reputation: 19
I am writing a lexer in C++ and I am reading from a file character by character, however, how do you do tokenization in this case? I can't use strtok since I have character not a string. Somehow I need to keep reading until I reach a delimeter?
Upvotes: 1
Views: 838
Reputation: 517
On the basis of information provided you. If you want to read upto a delimiter from a File, use getline(char *,int,char) function.
getline() is use to read upto n characters or upto a delimiter.
Example:
#include<fstream.h>
using namespace std;
main()
{
fstream f;
f.open("test.cpp",ios::in);
char *c;
f.getline(c,2,' ');
cout<<c; // upto 1 char or till a space
}
Upvotes: 0
Reputation: 300349
There are multiple solutions.
The simplest thing to do is exactly that: keep a buffer (std::string
) of the characters you already read until you reach a delimiter. At that point, you build a token from the accumulated characters in the buffer, clear the buffer, and push the delimiter (if necessary) in the buffer.
Another solution would be to read ahead of the time: ie, pick up the entire line with std::getline
(for example), and then check what's on this line. In general the end-of-line is a natural token delimiter.
This works well... when delimiters are easy.
Unfortunately some languages, like C++, have awkward grammars. For example, in C++ >>
can be either:
>>
(for right-shift and stream extraction)> >
)In those cases... well, just don't bother with the difference in the tokenizer, and let your AST building pass disambiguate, it's got more information.
Upvotes: 0
Reputation: 77364
The answer is Yes. You need to keep reading until you hit a delimiter.
Upvotes: 2