Yordan Kanchelov
Yordan Kanchelov

Reputation: 581

Regex performance issue on a really big string

Right now I am new to using regexes so I would really appreciate your help.

I have a really large string (I am parsing an as3 file to json) and I need to locate for those trailing commas out there in the objects..

This is the regex I am using

public static string TrimTraillingCommas(string jsonCode)
{
    var regex = new Regex(@"(.*?),\s*(\}|\])", (RegexOptions.Multiline));

    return regex.Replace(jsonCode, m => String.Format("{0} {1}", m.Groups[1].Value, m.Groups[2].Value));
}

The problem with it is that it's really slow. Without using it in the string the time to complete the program is : 00:00:00.0289668 and with it : 00:00:00.4096293

Could someone suggest a improved regex or algorithm for faster replacing those trailing commas.

Here is where i start from ( the string with the trailing commas )

Here is the end string I need

Upvotes: 1

Views: 186

Answers (2)

user557597
user557597

Reputation:

You don't need the first expression .*? and you can convert the alternation
into a character class. That's about the best you could do.

var regex = new Regex(@",[^\S\r\n]*([}\]])");
return regex.Replace(jsonCode, " $1");

Upvotes: 0

Douglas
Douglas

Reputation: 54877

You can simplify your regular expression by eliminating your capture groups, replacing the purpose of the latter one by a lookahead:

var regex = new Regex(@",\s*(?=\}|\])");
return regex.Replace(jsonCode, " ");

Upvotes: 1

Related Questions