Reputation: 2513
I'm looking for a method to split the following line of text into an array.
Here is some text\r\n"here is another line"\r\nAnd another line
Such that the resultant array is:
Here is some text
\r\n
"
here is another line
"
\r\n
And another line
Note there are essentially two delimeters here, " and \r\n.
I need to do this in C++ and there could be additional delimeters in the future.
Any ideas?
Thanks in advance.
Edit: No, this is not homework.
Here's what I have so far:
const RWCString crLF = "\r\n";
const RWCString doubleQuote = "\"";
RWTValOrderedVector<RWCString> Split(const RWCString &value, const RWCString &specialContent)
{
RWTValOrderedVector<RWCString> result;
unsigned index = 0;
RWCString str = value;
while ( ( index = str.index( specialContent, 0, RWCString::ignoreCase ) ) != RW_NPOS )
{
RWCString line = str(0, index);
result.append(line);
result.append(specialContent);
str = str(index, str.length() - index);
str = str(specialContent.length(), str.length() - specialContent.length());
}
if (str.length() > 0)
{
result.append(str);
}
return result;
}
void replaceSpecialContents(const RWCString &value)
{
RWTValOrderedVector<RWCString> allStrings;
RWTValOrderedVector<RWCString> crLFStrings = Split(value, crLF);
for (unsigned i=0; i<crLFStrings.entries(); i++)
{
RWTValOrderedVector<RWCString> dqStrings = Split(crLFStrings[i], doubleQuote);
if (dqStrings.entries() == 1)
{
allStrings.append(crLFStrings[i]);
}
else
{
for (unsigned j=0; j<dqStrings.entries(); j++)
{
allStrings.append(dqStrings[j]);
}
}
}
}
Upvotes: 2
Views: 4676
Reputation: 4875
Here's a way to do it that will work in C and C++:
//String to tokenize:
char str[] = "let's get some tokens!";
//A set of delimiters:
char delims[] = " ";
//List of tokens:
char *tok1 = NULL,
*tok2 = NULL,
*tok3 = NULL;
//Tokenize the string:
tok1 = strtok(str, delims);
tok2 = strtok(NULL, delims); //after you get the first token
tok3 = strtok(NULL, delims); //supply "NULL" as first strtok parameter
You can modify this is various ways. You can put all "strtok(NULL, delims)" calls in a loop to make it more flexible, you can interface with C++ string using .c_str(), etc.
Upvotes: 2
Reputation: 8116
Building on the Rogue Wave SourcePro API you're using, you could use RWTRegex to split the string into tokens:
RWTValOrderedVector<RWCString> tokenize(const RWCString& str)
{
RWTRegex<char> re("\\r\\n|\"|([^\"\\r]|\\r[^\\n])*|\\r$");
RWTRegex<char>::iterator it(re, str);
RWTValOrderedVector<RWCString> result;
for (; it != RWTRegex<char>::iterator(); ++it) {
result.append(it->subString(str));
}
return result;
}
For details on RWTRegex see http://www.roguewave.com/Portals/0/products/sourcepro/docs/12.0/html/sourceproref/classRWTRegex.html.
Upvotes: 1
Reputation: 2987
strtok will replace your tokens with NULL. That's why it does not include the tokens.
man strtok for more information. I'm also playing around with strtok and strtok_r as I have incoming char array of the following
Hello~Milktea~This is my message\r\nMessage~I have a good watch~Cartier\r\n
I am going to first strip the ~ (tildes) followed by the \r\n, or vice versa.
Upvotes: 0
Reputation: 264331
A really simple way is to just use flex:
You can build a really simpler lexer for a C++ application in a few lines that is very readable.
I would note that you should be careful with '\r\n'. If you open a file in text mode (the default) then the standard stream reading will convert the standard line termination sequence into a '\n'. On some platforms the end of line termination sequence is '\r\n' and thus if you read a stream from a file you may only see a '\n' character.
%option c++
%option noyywrap
%%
\" return 1;
\r\n return 2;
[^"\r\n]* return 3;
%%
#include "FlexLexer.h"
int main()
{
yyFlexLexer lexer(&std::cin, &std::cout);
int token;
while((token = lexer.yylex()) != 0)
{
std::string tok(lexer.YYText(), lexer.YYText() + lexer.YYLeng());
std::cout << "Token: " << token << "(" << tok << ")\n";
}
}
% flex split.lex
% g++ main.cpp lex.yy.cc
% cat testfile | ./a.exe
Token: 3(Here is some text)
Token: 2(
)
Token: 1(")
Token: 3(here is another line)
Token: 1(")
Token: 2(
)
Token: 3(And another line)
Upvotes: 1
Reputation: 18652
Here is a method that uses TR1 regex features.
std::string text("Here is some text\r\n\"here is another line\"\r\nAnd another line");
std::vector<std::string> vec;
std::regex rx("[\\w ]+|\\r\\n|\"");
std::sregex_iterator rxi(text.begin(), text.end(), rx), rxend;
for (; rxi != rxend; ++rxi)
{
vec.push_back(rxi->str());
}
In my testing, this populates the vector with the 7 substrings in your example. I'm no expert so there may be a more correct regular expression than the one I'm using.
Upvotes: 0
Reputation: 76519
You can use string::find_first_of
and string::substr
. Just be careful to check for "empty" strings; find_first_of
will find char
s, so \r
and \n
will both be split off by the resulting algorithm.
Alternatively, iterate over the whole string, and copy the previous part when you come across another delimiter.
Upvotes: 1
Reputation: 19104
Bissect the problem as follows:
Now, solve 1 and 2. If any problem, ask again.
Upvotes: 1
Reputation: 25551
getline
has an optional delimiter, so you can use stringstream
to do it with very little effort on your part. The downside is that (I believe) it only works with one delimiter at a time.
Upvotes: 1