ahmedsafan86
ahmedsafan86

Reputation: 1794

What regular expression to find strings only in C++ source files

I have a C++ application, I'm converting each string literals and also methods to use the generic type to enable unicode in other words the following conversion is being Done

const char* str = "this is \"simple string\""; //=> const TCHAR* str = _T("this is \"simple string\"");
MessageBoxA(NULL, "message", "title", MB_OK);//=>MessageBox(NULL, _T("message"), _T("title"), MB_OK);    
size_t len = strlen(str);//=>size_t len = _tcslen(str);

the big problem is that the application contains a lot of literal strings, I need a regular expression pattern to find the literal strings only and replace them with _T(previous_str), I found expressions from the web and many from stackoverflow but they matched also the header includes

#include "stdafx.h" // => #include _T("stdafx.h")

I need also to avoid strings that is starts with _T( and ends with ) [that is already converted before].

Upvotes: 0

Views: 720

Answers (1)

zx81
zx81

Reputation: 41838

Ahmed this is an interesting question. Let's talk about how we would do this with regex. There are a number of options, here is what I would do.

A. I would process the files outside of VisualStudio so you can use the full power of regex. You could use C++, C# or a scripting language such as PHP or python and feed it an array of files to process, or a folder.

B. Here is a regex that would capture the strings you want into Group 1:

(?s)_T\([^)]*\)|#include[^\n]*|"((?:[^"]|(?<=\\)")+)(?<!\\)"

With this regex, we want to completely ignore the overall match returned, instead only focusing on the Group 1 captures, if any.

In your test text, the captures are this is \"simple string\" and the right message and title.

This captures the inside of the strings, but we'll probably need them for the replacement, so to include the double quotes, just move them inside Group 1:

(?s)_T\([^)]*\)|#include[^\n]*|("(?:[^"]|(?<=\\)")+(?<!\\)")

C. When calling your language's regex Replace function, instead of directly passing a replacement, you pass it a callback function. That function will automatically have access to the Group 1 matches (that is how replace callback works), and you can manipulate the replacement to your heart's content: for instance, if Group 1 is empty, don't replace (it means we matched the strings you want to avoid). If you have a Group 1, do your concatenation magic.

Hard to be more specific but this is the general recipe I would follow with regex.

Upvotes: 1

Related Questions