user2754070
user2754070

Reputation: 507

Tokenization of strings in C++

I am using the following code for splitting of each word into a Token per line. My problem lies here: I want a continuous update on my number of tokens in the file. The contents of the file are:

Student details:
Highlander 141A Section-A.
Single 450988012 SA

Program:

#include <iostream>
using std::cout;
using std::endl;

#include <fstream>
using std::ifstream;

#include <cstring>

const int MAX_CHARS_PER_LINE = 512;
const int MAX_TOKENS_PER_LINE = 20;
const char* const DELIMITER = " ";

int main()
{
  // create a file-reading object
  ifstream fin;
  fin.open("data.txt"); // open a file
  if (!fin.good()) 
    return 1; // exit if file not found

  // read each line of the file
  while (!fin.eof())
  {
    // read an entire line into memory
    char buf[MAX_CHARS_PER_LINE];
    fin.getline(buf, MAX_CHARS_PER_LINE);

    // parse the line into blank-delimited tokens
    int n = 0; // a for-loop index

    // array to store memory addresses of the tokens in buf
    const char* token[MAX_TOKENS_PER_LINE] = {}; // initialize to 0

    // parse the line
    token[0] = strtok(buf, DELIMITER); // first token
    if (token[0]) // zero if line is blank
    {
      for (n = 1; n < MAX_TOKENS_PER_LINE; n++)
      {
        token[n] = strtok(0, DELIMITER); // subsequent tokens
        if (!token[n]) break; // no more tokens
      }
    }

    // process (print) the tokens
    for (int i = 0; i < n; i++) // n = #of tokens
      cout << "Token[" << i << "] = " << token[i] << endl;
      cout << endl;
  }
}

Output:

Token[0] = Student
Token[1] = details:

Token[0] = Highlander
Token[1] = 141A
Token[2] = Section-A.

Token[0] = Single
Token[1] = 450988012
Token[2] = SA

Expected:

Token[0] = Student
Token[1] = details:

Token[2] = Highlander
Token[3] = 141A
Token[4] = Section-A.

Token[5] = Single
Token[6] = 450988012
Token[7] = SA

So I want it to be incremented so that I could easily identify the value by its variable name. Thanks in advance...

Upvotes: 1

Views: 2506

Answers (2)

Olaf Dietsche
Olaf Dietsche

Reputation: 74028

I would just let iostream do the splitting

std::vector<std::string> token;
std::string s;
while (fin >> s)
    token.push_back(s);

Then you can output the whole array at once with proper indexes.

for (int i = 0; i < token.size(); ++i)
    cout << "Token[" << i << "] = " << token[i] << endl;

Update:

You can even omit the vector altogether and output the tokens as you read them from the input strieam

std::string s;
for (int i = 0; fin >> s; ++i)
    std::cout << "Token[" << i << "] = " << token[i] << std::endl;

Upvotes: 0

James Kanze
James Kanze

Reputation: 153909

What's wrong with the standard, idiomatic solution:

std::string line;
while ( std::getline( fin, line ) ) {
    std::istringstream parser( line );
    int i = 0;
    std::string token;
    while ( parser >> token ) {
        std::cout << "Token[" << i << "] = " << token << std::endl;
        ++ i;
    }
}

Obviously, in real life, you'll want to do more than just output each token, and you'll want more complicated parsing. But anytime you're doing line oriented input, the above is the model you should be using (probably keeping track of the line number as well, for error messages).

It's probably worth pointing out that in this case, an even better solution would be to use boost::split in the outer loop, to get a vector of tokens.

Upvotes: 2

Related Questions