Reputation: 85
I am working on a lexer analyzer using C++Builder XE6 and this is what I've done so far: I have two memos (memoIN, memoOUT). memoIN contains the text to be analyzed and memoOUT the output (list of tokens). First, I strip the memoIN content from all comments using boost::regex, and this works like a charm. Now I'm stuck on how to extract all double quotes from the text and display them as a string in the output memo.
All iIhave so far is an expression that removes all double quotes but not what i need, i need to extract theme and display theme for example:
memoIN :
This is a "Double" Quote and this is "another one"
memoOUT :
<(String "Double") #Line 01 #Length 06)>
<(String "another one") #Line 01 #Length 11)>
Upvotes: 1
Views: 999
Reputation: 58382
Using Boost.Regex
Here's some sample code that demonstrates using boost::regex
to extract text within quotes.
#include <string>
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
using namespace boost;
int main(int argc, char **argv) {
// Capture any non-quotes that occur within double quotes.
boost::regex re("\"([^\"]+)\"");
// Input text
std::string memoIN = "This is a \"Double\" Quote and this is \"another one\"";
// Iterate through memoIN
boost::sregex_iterator m1(memoIN.begin(), memoIN.end(), re);
// Ending iterator (using the default constructor)
boost::sregex_iterator m2;
for (; m1 != m2; ++m1) {
// Replace this with code to organize memoOUT
std::cout << (*m1)[1].str() << std::endl;
}
return 0;
}
Using a lexer library
Depending on how sophisticated your needs are, you may find that you're better in the long run using a dedicated lexer and parser generator (like ANTLR3 C) than writing your own with Boost.Regex.
Interfacing with UnicodeString
There are several approaches to handling mismatches between C++Builder's AnsiString
and UnicodeString
and Standard C++'s std::string
and std::wstring
. One simple approach is to convert UnicodeString
to std::string
for internal text manipulation then convert it back to UnicodeString
for the UI. For example:
// Use AnsiString to convert from UTF-16 to a narrow character encoding
std::string memoIN_text = AnsiString(MemoIN->Text).c_str();
std::string memoOUT_text;
// Insert Boost.Regex manipulation here and assign the results to memoOUT_text
// Use implicit conversion from const char* to AnsiString/UnicodeString
MemoOUT->Text = memoOUT_text.c_str();
Converting from Unicode to ANSI may lose data, so you may want to use SetMultiByteConversionCodePage to tell C++Builder to use UTF-8 for AnsiString. (Character encoding is complicated enough to be its own topic.)
Upvotes: 3