Müsli
Müsli

Reputation: 1774

RegEx match strings

I have a question related to regular expressions in c#.

I want to find text between " characters. Example:

 Enum resultado = SPDialogBox.Instance.show<ACTION_ENUMs.TORNEO_SORTEAR>("Esto es una prueba");

Matches: Esto es una prueba

But, in this example

Enum resultado = SPDialogBox.Instance.show<ACTION_ENUMs.TORNEO_SORTEAR>("Esto es una prueba");
pKR_MESAPUESTOASIGNACION.CONFIGTORNEO_ID = Valid.GetInt(dr.Cells["CONFIGTORNEO_ID"].Value);

Matches: Esto es una prueba but must not match CONFIGTORNEO_ID, because it is written between square brackets ([])

In brief, I want to match string between double quote (") characters, but that string must not be written between square brackets ([]).

Here is my code:

var pattern = "\"(.*?)\"";
var matches = Regex.Matches(fullCode, pattern, RegexOptions.Multiline);

foreach (Match m in matches)
{
    Console.WriteLine(m.Groups[1]);
}

That pattern matches all string between " characters, but how can I modify the pattern to exclude those string that are written between square brackets?

-- edit ---

here is another example:

List<String> IdSorteados = new List<String>();
int TablesToSort = 0;
foreach (UltraGridRow dr in fg.hfg_Rows)
{
    if (dr.Cells["MESA_ID"].Value == DBNull.Value && dr.Cells["Puesto"].Value == DBNull.Value && !Valid.GetBoolean(dr.Cells["BELIMINADO"].Value) && (Valid.GetBoolean(dr.Cells["Seleccionado"].Value) || SortearTodo))
        TablesToSort++;
    }

The expression must not match MESA_ID ( found within Cells["MESA_ID"].Value ) nor Puesto (found within Cells["Puesto"].Value ). It also must not match ].Value == DBNull.Value && dr.Cells[ (found within ["MESA_ID"].Value == DBNull.Value && dr.Cells["Puesto"] )

I hope I have made my intent clear.

Upvotes: 3

Views: 1500

Answers (4)

Eugen Mihailescu
Eugen Mihailescu

Reputation: 3711

Many times I have to parse source code files (php|cpp|java|js|css|etc) and do some regexp replacements. To avoid replacing some strings/messages I mask all strings before doing my replacements, so I have to capture all possible strings and mask them.

This is how I capture all strings: /(['"])(\\\1|.)*?\1/gm which means:

  • capture everything that starts with single|double quote: ['"]
  • it may be followed by zero or more characters, even by the same quote symbol (which is not considered the end of the string) if it's preceded by forward-slash (the escape \ operator): (\\\1|.)*
  • make sure that the above pattern stops at the first occurrence and not at its last match (ie. don't be greedy): ?
  • finally our string ends when it's followed by the same starting single|double quote: \1

I want this search to be made both globally (to capture all possible matches) and also multi-line (a string may not continue on a new line delimited by CRLF, right?)

Perhaps you are interested not only to find but also to capture these strings groups so make sure you put within group delimiter the (\\\1|.)*? which gives the final pattern:

([\'"])((\\\1|.)*?)\1

Examples of strings captured:

defined ( 'WP_DEBUG' ) || define( '\WP_DEBUG', true );
echo 'class="input-text card-number" type="text" maxlength="20"';
echo 'How are you? I\'m fine, thank you';

Check my pattern in an online regex tester.

Upvotes: 0

ddarellis
ddarellis

Reputation: 3960

I think something like this:

^[^\"]*\"([^\"]*)\".*$

Upvotes: 0

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726579

To avoid matching quoted nested inside square brackets, you need to check that one of the following is true:

  • The last non-whitespace character preceding the opening double quote is not a [, or
  • The first non-whitespace character following the closing double-quote is not a ]

This can be done using this regexp:

(?<!\[\s*)\"[^"]*\"(?!\s*\])

It uses the lookaround feature of .NET regexp engine.

Note how this expression avoids the reluctant qualifier ? inside the quoted string by using [^"]* instead of .*?.

Upvotes: 1

Brad Christie
Brad Christie

Reputation: 101604

Simple use a negative look-behind:

(?<!\[)

Basically, only match a string when not preceded by a [. Example here, and code as follows:

String fullCode = "Enum resultado = SPDialogBox.Instance.show<ACTION_ENUMs.TORNEO_SORTEAR>(\"Esto es una prueba\");\r\n"
                + "pKR_MESAPUESTOASIGNACION.CONFIGTORNEO_ID = Valid.GetInt(dr.Cells[\"CONFIGTORNEO_ID\"].Value);";
String pattern = @"(?<!\[)\x22(.*?)\x22";
var matches = Regex.Matches(fullCode, pattern, RegexOptions.Multiline);
foreach (Match m in matches)
{
    Console.WriteLine(m.Groups[1]);
}

Upvotes: 2

Related Questions