Narek
Narek

Reputation: 39881

C++ split string line by line using std::regex

I use this function to split the string:

std::vector<std::string> splitString(const std::string& stringToSplit, const std::string& regexPattern)
{
    std::vector<std::string> result;

    const std::regex rgx(regexPattern);
    std::sregex_token_iterator iter(stringToSplit.begin(), stringToSplit.end(), rgx, -1);

    for (std::sregex_token_iterator end; iter != end; ++iter)
    {
        result.push_back(iter->str());
    }

    return result;
}

Now, if I want to split a string line by line (say, I have read a file content into a single variable), I do this:

auto vec = splitString(fileContent, "\\n");

On Windows, I get this:

line 1 \r
line 2 \r

This happens because Windows line ending is determined with \r\n. I have tried to use $, but again without success. What is the right way to capture line endings in Windows, too?

Upvotes: 2

Views: 3100

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626728

On Linux/Unix, OS X, iOS, Android OSes, line separators are either \r, or \n, or a combination of them. So, the most efficient way to capture them all is placing into a character class and use the + quantifier.

Thus, [\\r\\n]+ should "do the trick":

auto vec = splitString(fileContent, "[\\r\\n]+");

EDIT:

As @FabioFracassi mentions, this will remove empty lines. If empty lines should be preserved in the output, you can use

auto vec = splitString(fileContent, "(?:\\r\\n|\\r|\\n)");

The alternative list is starting with the longest option, since regular expressions are processed from left to right (at least, by default).

Upvotes: 3

Related Questions