Ricky
Ricky

Reputation: 883

How do I extract a string between braces on the same level?

Imagine I have an unknown string that follows the following format:

Blablabla
{
    "Some Text"
    2
    {
        "Sub Text"
         99
    }
    2
    {
        "Sub Text"
         99
    }
}
Blablabla2
{
    "Some Text"
    2
    {
        "Sub Text"
         99
    }
}

I need to be able to extract from this string each substring in between the first layer of delimiters ({ and }). So, in this example, running the below function:

ExtractStringBetweenDelimitersOnSameLevel(string, "{", "}")

Should extract the following substring from the original string, and then return it:

    "Some Text"
    2
    {
        "Sub Text"
         99
    }

The problem is that it is returning a shorter string due to the second layer of delimiters.

Here's my code:

const int Count(
   const std::string& haystack,
   const std::string& needle,
   const int starting_index,
   const int maximum_index)
{
   int total = 0;
   int offset = starting_index;

   size_t current_index = std::string::npos;
   while ((current_index = haystack.find(needle, offset)) != std::string::npos)
   {
      if (current_index >= maximum_index)
      {
         break;
      }

      total++;
      offset = static_cast<int>(current_index + needle.size());
   }

   return total;
}

const size_t FindNthDelimiter(
   const std::string& haystack,
   const std::string& needle,
   const int nth)
{
   int total_found = 0;
   int offset = 0;

   size_t current_index = std::string::npos;
   while ((current_index = haystack.find(needle, offset)) != std::string::npos)
   {
      total_found++;
      offset = static_cast<int>(current_index) + 1;

      if (total_found == nth)
      {
         return offset;
      }
   }

   std::cout << "String does not have nth element." << std::endl;

   return offset;
}

std::string ExtractStringBetweenDelimitersOnSameLevel(
   std::string& original_string,
   const std::string& opening_delimiter,
   const std::string& closing_delimiter)
{
   // Find the first delimiter...
   const size_t first_delimiter = original_string.find(opening_delimiter);
   if (first_delimiter != std::string::npos)
   {
      const size_t second_delimiter = original_string.find(closing_delimiter);
      if (second_delimiter != std::string::npos)
      {
         // Total first delimiters found until first closed delimiter...
         int total_first_delimiters = Count(original_string, opening_delimiter, static_cast<int>(first_delimiter), static_cast<int>(second_delimiter));
         const size_t index_of_nth_closer = FindNthDelimiter(original_string, closing_delimiter, total_first_delimiters);

         std::string needle = original_string.substr(first_delimiter + opening_delimiter.size(), index_of_nth_closer - opening_delimiter.size() - 1);
         original_string.erase(first_delimiter, index_of_nth_closer + closing_delimiter.size());

         return needle;
      }
   }

   return "";
}

Upvotes: 1

Views: 307

Answers (1)

Sam Varshavchik
Sam Varshavchik

Reputation: 118320

"The more you overthink the plumbing, the easier it is to stop up the drain." -- Scotty, Star Trek III.

The shown code looks to be way overengineered for such a simple task.

Also, it does not appear to even fully implement the given task. The task was described as extracting every top level string:

each substring in between the first layer of delimiters

But the shown code appears to extract only the first one. It's not worth it to try to figure out where that complicated algorithm goes wrong. It's easier to just rewrite it to do the entire task, at half the original size. This shouldn't take more than a dozen, or two, lines of code, at least for the root algorithm. And the code to extract just the first string was already many times longer than that.

The following example extracts every top level string between the matching { and } delimiters, and returns it to a lambda callback. main() supplies a sample lambda that prints each string to std::cout

#include <string>
#include <algorithm>
#include <iostream>

template<typename functor_type> void ExtractStringBetweenDelimitersOnSameLevel(
    const std::string &original_string,
    char opening_delimiter, // Should be '{'
    char closing_delimiter, // Should be '}'
    functor_type &&functor) // Lambda that receives each string.
{
    auto b=original_string.begin(), e=original_string.end(), p=b;

    int nesting_level=0;

    while (b != e)
    {
        if (*b == closing_delimiter)
        {
            if (nesting_level > 0 && --nesting_level == 0)
            {
                functor(std::string(p, b));
            }
        }

        if (*b++ == opening_delimiter)
        {
            if (nesting_level++ == 0)
                p=b;
        }
    }
}


int main()
{
    std::string search_string="\n"
        "Blablabla\n"
        "{\n"
        "    \"Some Text\"\n"
        "    2\n"
        "    {\n"
        "        \"Sub Text\"\n"
        "         99\n"
        "    }\n"
        "    2\n"
        "    {\n"
        "        \"Sub Text\"\n"
        "         99\n"
        "    }\n"
        "}\n"
        "Blablabla2\n"
        "{\n"
        "    \"Some Text\"n"
        "    2\n"
        "    {\n"
        "        \"Sub Text\"\n"
        "         99\n"
        "    }\n"
        "}";

    ExtractStringBetweenDelimitersOnSameLevel
        (search_string,
         '{',
         '}',
         [](const std::string &string)
         {
             std::cout << "Extracted: " << string << std::endl;
         });
}

Your homework assignment is to modify this to handle multi-character delimiters. This shouldn't be much more complicated, either.

Upvotes: 1

Related Questions