Extract double quotes using boost::regex in C++Builder

Question

I am working on a lexer analyzer using C++Builder XE6 and this is what I've done so far: I have two memos (memoIN, memoOUT). memoIN contains the text to be analyzed and memoOUT the output (list of tokens). First, I strip the memoIN content from all comments using boost::regex, and this works like a charm. Now I'm stuck on how to extract all double quotes from the text and display them as a string in the output memo.

All iIhave so far is an expression that removes all double quotes but not what i need, i need to extract theme and display theme for example:

memoIN :

This is a "Double" Quote and this is "another one"

memoOUT :

<(String "Double") #Line 01 #Length 06)>
<(String "another one") #Line 01 #Length 11)>

Josh Kelley · Accepted Answer

Using Boost.Regex

Here's some sample code that demonstrates using boost::regex to extract text within quotes.

#include 
#include 
#include 

using namespace std;
using namespace boost;

int main(int argc, char **argv) {
  // Capture any non-quotes that occur within double quotes.
  boost::regex re("\"([^\"]+)\"");

  // Input text
  std::string memoIN = "This is a \"Double\" Quote and this is \"another one\"";

  // Iterate through memoIN
  boost::sregex_iterator m1(memoIN.begin(), memoIN.end(), re);

  // Ending iterator (using the default constructor)
  boost::sregex_iterator m2;

  for (; m1 != m2; ++m1) {
    // Replace this with code to organize memoOUT
    std::cout << (*m1)[1].str() << std::endl;
  }

  return 0;
}

Using a lexer library

Depending on how sophisticated your needs are, you may find that you're better in the long run using a dedicated lexer and parser generator (like ANTLR3 C) than writing your own with Boost.Regex.

Interfacing with UnicodeString

There are several approaches to handling mismatches between C++Builder's AnsiString and UnicodeString and Standard C++'s std::string and std::wstring. One simple approach is to convert UnicodeString to std::string for internal text manipulation then convert it back to UnicodeString for the UI. For example:

// Use AnsiString to convert from UTF-16 to a narrow character encoding
std::string memoIN_text = AnsiString(MemoIN->Text).c_str();

std::string memoOUT_text;
// Insert Boost.Regex manipulation here and assign the results to memoOUT_text

// Use implicit conversion from const char* to AnsiString/UnicodeString
MemoOUT->Text = memoOUT_text.c_str();

Converting from Unicode to ANSI may lose data, so you may want to use SetMultiByteConversionCodePage to tell C++Builder to use UTF-8 for AnsiString. (Character encoding is complicated enough to be its own topic.)

Extract double quotes using boost::regex in C++Builder

Answers (1)

Related Questions