Reputation: 10122
I see how to tokenise a string in the traditional manner (i.e. this answer here How do I tokenize a string in C++?) but how can I split a string by its tokens, also including them?
For example given a date/time picture such as yyyy\MMM\dd HH:mm:ss, I would like to split into an array with the following:
"yyyy", "\", "MMM", "\", "dd", " " , "HH", ":", "mm", ":", "ss"
The "tokens" are yyyy, MMM, dd, HH, mm, ss in this example. I don't know what the separators are, only what the tokens are. The separators need to appear in the final result however. The complete list of tokens is:
"yyyy" // – four-digit year, e.g. 1996
"yy" // – two-digit year, e.g. 96
"MMMM" // – month spelled out in full, e.g. April
"MMM" // – three-letter abbreviation for month, e.g. Apr
"MM" // – two-digit month, e.g. 04
"M" // – one-digit month for months below 10, e.g. 4
"dd" // – two-digit day, e.g. 02
"d" // – one-digit day for days below 10, e.g. 2
"ss" // - two digit second
"s" // - one-digit second for seconds below 10
"mm" // - two digit minute
"m" // - one-digit minute for minutes below 10
"tt" // - AM/PM designator
"t" // - first character of AM/PM designator
"hh" // - 12 hour two-digit for hours below 10
"h" // - 12 hour one-digit for hours below 10
"HH" // - 24 hour two-digit for hours below 10
"H" // - 24 hour one-digit for hours below 10
I've noticed the standard library std::string isn't very strong on parsing and tokenising and I can't use boost. Is there a tight, idiomatic solution? I'd hate to break out a C-style algorithm for doing this. Performance isn't a consideration.
Upvotes: 1
Views: 170
Reputation: 4668
Perhaps http://www.cplusplus.com/reference/cstring/strtok/ is what you're looking for, with a useful example.
However, it eats the delimiters. You could solve that problem with comparing the base pointer and the resulting string, moving forward by the string length.
#include <iostream>
#include <cstdio>
#include <cstring>
#include <vector>
#include <sstream>
int main()
{
char data[] = "yyyy\\MMM\\dd HH:mm:ss";
std::vector<std::string> tokens;
char* pch = strtok (data,"\\:"); // pch holds 'yyyy'
while (pch != NULL)
{
tokens.push_back(pch);
int delimeterIndex = static_cast<int>(pch - data + strlen(pch)); // delimeter index: 4, 8, ...
std::stringstream ss;
ss << delimeterIndex;
tokens.push_back(ss.str());
pch = strtok (NULL,"\\:"); // pch holds 'MMM', 'dd', ...
}
for (const auto& token : tokens)
{
std::cout << token << ", ";
}
}
This gives output of:
yyyy, 4, MMM, 8, dd HH, 14, mm, 17, ss, 20,
Upvotes: 1