atulya
atulya

Reputation: 539

C++ std::regex_replace to replace different match with different strings

I am trying to replace the occurrences of CR(\r), LF(\n), and the combination of CR and LF as follows

  1. search for patter ([\r\n]+) // pattern can be '\r', '\r\r\r\n\n' '\r\n\r\n' or any combination.
  2. If length is 1 and character is CR, replace with LF. // pattern is '\r'
  3. If length is 2 and both characters are different then replace with LF. // pattern is '\r\n or \n\r'
  4. else replace with 2 LF. // any pattern longer than 2 characters
std::regex search_exp("([\r\n]+)");
auto replace_func = [](std::string& str_mat) -> std::string {
        std::string ret = "";
        if ((str_mat.length() == 1)) {
          if (str_mat == "\r")
            ret = "\n";
        } else if (str_mat.length() == 2 && (str_mat.at(0) != str_mat.at(1))) {
            ret = "\n";
        } else {
          ret = "\n\n";
        }
        return ret;
    }; 
auto str = std::regex_replace(str, search_exp, replace_func);

But std::regex_replace does take lambda function. :(

Edit: Example: "\rThis is just an example\n Learning CPP \r\n Stuck at a point \r\n\n\r C++11 onwards \r\r\r\n\n"

Any suggestions?

Upvotes: 1

Views: 833

Answers (1)

Chris Maurer
Chris Maurer

Reputation: 2567

The problem with what you've written is that regex_replace repeats its search & replace for each occurrence. Sometimes this will require one \n and in other cases two. You can't really tailor your replacement in that way.

You can, however, get clever with lookaheads that won't consume the final CR or LF. Then you rely on the repeated processing of regex_replace to add the second \n.

regex_replace(str, "(\r\n?|\n\r)(?![\n\r])|[\n\r]+(?=[\n\r])", "\n")

This is pretty daunting. But you can take it apart into two pieces. The first half is (\r\n?|\n\r)(?![\n\r])| which looks for CR, or CRLF, or LFCR and looks ahead to make sure there are NO CR or LF following. The second half is [\n\r]+(?=[\n\r]) which looks for multiple CR or LF characters in any combination, but does not include the last one. That insures the next repeat will find it and add the second LF.

Upvotes: 2

Related Questions