Carl
Carl

Reputation: 2513

How to split these strings into an array

I'm looking for a method to split the following line of text into an array.

Here is some text\r\n"here is another line"\r\nAnd another line

Such that the resultant array is:

Here is some text

\r\n

"

here is another line

"

\r\n

And another line

Note there are essentially two delimeters here, " and \r\n.
I need to do this in C++ and there could be additional delimeters in the future.
Any ideas?

Thanks in advance.

Edit: No, this is not homework.

Here's what I have so far:

const RWCString crLF = "\r\n";
const RWCString doubleQuote = "\"";


    RWTValOrderedVector<RWCString> Split(const RWCString &value, const RWCString &specialContent)
    {
        RWTValOrderedVector<RWCString> result;
    
        unsigned index = 0;
    
        RWCString str = value;
    
        while ( ( index = str.index( specialContent, 0, RWCString::ignoreCase ) ) != RW_NPOS )
        {
            RWCString line = str(0, index);
    
            result.append(line);
            result.append(specialContent);
    
            str = str(index, str.length() - index);
            str = str(specialContent.length(), str.length() - specialContent.length());
        }
    
        if (str.length() > 0)
        {
            result.append(str);
        }
    
        return result;
    }
    
        void replaceSpecialContents(const RWCString &value)
        {
        
            RWTValOrderedVector<RWCString> allStrings;
        
            RWTValOrderedVector<RWCString> crLFStrings = Split(value, crLF);
        
            for (unsigned i=0; i<crLFStrings.entries(); i++)
            {
            RWTValOrderedVector<RWCString> dqStrings = Split(crLFStrings[i], doubleQuote);
        
                if (dqStrings.entries() == 1)
                {
                    allStrings.append(crLFStrings[i]);
                }
                else
                {
                    for (unsigned j=0; j<dqStrings.entries(); j++)
                    {
                        allStrings.append(dqStrings[j]);
                    }
                }
            }
    
    }

Upvotes: 2

Views: 4676

Answers (8)

kmarks2
kmarks2

Reputation: 4875

Here's a way to do it that will work in C and C++:

//String to tokenize:
char str[] = "let's get some tokens!";

//A set of delimiters:
char delims[] = " ";

//List of tokens:
char *tok1 = NULL,
     *tok2 = NULL,
     *tok3 = NULL;

//Tokenize the string:
tok1 = strtok(str, delims);
tok2 = strtok(NULL, delims); //after you get the first token
tok3 = strtok(NULL, delims); //supply "NULL" as first strtok parameter

You can modify this is various ways. You can put all "strtok(NULL, delims)" calls in a loop to make it more flexible, you can interface with C++ string using .c_str(), etc.

Upvotes: 2

DRH
DRH

Reputation: 8116

Building on the Rogue Wave SourcePro API you're using, you could use RWTRegex to split the string into tokens:

RWTValOrderedVector<RWCString> tokenize(const RWCString& str)
{
    RWTRegex<char> re("\\r\\n|\"|([^\"\\r]|\\r[^\\n])*|\\r$");

    RWTRegex<char>::iterator it(re, str);

    RWTValOrderedVector<RWCString> result;
    for (; it != RWTRegex<char>::iterator(); ++it) {
        result.append(it->subString(str));
    }
    return result;
}

For details on RWTRegex see http://www.roguewave.com/Portals/0/products/sourcepro/docs/12.0/html/sourceproref/classRWTRegex.html.

Upvotes: 1

Poliquin
Poliquin

Reputation: 2987

strtok will replace your tokens with NULL. That's why it does not include the tokens.

man strtok for more information. I'm also playing around with strtok and strtok_r as I have incoming char array of the following

Hello~Milktea~This is my message\r\nMessage~I have a good watch~Cartier\r\n

I am going to first strip the ~ (tildes) followed by the \r\n, or vice versa.

Upvotes: 0

Loki Astari
Loki Astari

Reputation: 264331

A really simple way is to just use flex:
You can build a really simpler lexer for a C++ application in a few lines that is very readable.

Note:

I would note that you should be careful with '\r\n'. If you open a file in text mode (the default) then the standard stream reading will convert the standard line termination sequence into a '\n'. On some platforms the end of line termination sequence is '\r\n' and thus if you read a stream from a file you may only see a '\n' character.

split.lex

%option c++
%option noyywrap
%%
\"           return 1;
\r\n         return 2;
[^"\r\n]*    return 3;
%%

main.cpp

#include "FlexLexer.h"

int main()
{
    yyFlexLexer     lexer(&std::cin, &std::cout);
    int             token;

    while((token = lexer.yylex()) != 0)
    {
        std::string  tok(lexer.YYText(), lexer.YYText() + lexer.YYLeng());
        std::cout << "Token: " << token << "(" << tok << ")\n";
    }
}

Build

% flex split.lex
% g++ main.cpp lex.yy.cc

Run (on pre-pared file)

% cat testfile | ./a.exe
Token: 3(Here is some text)
Token: 2(
)
Token: 1(")
Token: 3(here is another line)
Token: 1(")
Token: 2(
)
Token: 3(And another line)

Upvotes: 1

Blastfurnace
Blastfurnace

Reputation: 18652

Here is a method that uses TR1 regex features.

std::string text("Here is some text\r\n\"here is another line\"\r\nAnd another line");
std::vector<std::string> vec;

std::regex rx("[\\w ]+|\\r\\n|\"");
std::sregex_iterator rxi(text.begin(), text.end(), rx), rxend;

for (; rxi != rxend; ++rxi)
{
    vec.push_back(rxi->str());
}

In my testing, this populates the vector with the 7 substrings in your example. I'm no expert so there may be a more correct regular expression than the one I'm using.

Upvotes: 0

rubenvb
rubenvb

Reputation: 76519

You can use string::find_first_of and string::substr. Just be careful to check for "empty" strings; find_first_of will find chars, so \r and \n will both be split off by the resulting algorithm.

Alternatively, iterate over the whole string, and copy the previous part when you come across another delimiter.

Upvotes: 1

Pavel Radzivilovsky
Pavel Radzivilovsky

Reputation: 19104

Bissect the problem as follows:

  1. I have a pointer to substring. How to find the next substring?
  2. I have a pointer to substring. How do I add it as the next element to array?

Now, solve 1 and 2. If any problem, ask again.

Upvotes: 1

Maxpm
Maxpm

Reputation: 25551

getline has an optional delimiter, so you can use stringstream to do it with very little effort on your part. The downside is that (I believe) it only works with one delimiter at a time.

Upvotes: 1

Related Questions