Reputation: 5301
Someone asked this as an encore to an otherwise unrelated question (C# Regex split by multiple closing brackets sets), so I'm adding it as a separate question:
I have a number of lines and if one of those lines is longer than 50 chars, I would like to split those lines at the last comma (,) before 50 characters.
Example input:
AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )
AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )
Expected output:
AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10',
'PM99', 'PM59' ) )
AND ( AUART IN ( 'PM01', 'PM01232132132',
'PM03' ) )
Upvotes: 0
Views: 210
Reputation: 163372
You might also use a pattern without capture groups and a lookbehind asserting either the start of the string or a comma.
Then assert 50 chars to the right and match 1-49 characters followed by a comma.
In the replacement use the full match followed by a newline $0\n
(?<=^|,)(?=.{50}).{1,49},
List<string> strings = new List<string>()
{
"AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )",
"AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )",
"AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )",
"AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )",
"AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04', 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16', 'PM21', 'PM22', 'PM23', 'PM24', 'PM25', 'PM31' ) )"
};
var regex = new Regex(@"(?<=^|,)(?=.{50}).{1,49},");
foreach (String s in strings)
{
Console.WriteLine(regex.Replace(s, "$0\n"));
}
Output
AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10',
'PM99', 'PM59' ) )
AND ( AUART IN ( 'PM01', 'PM01232132132',
'PM03' ) )
AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04',
'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16',
'PM21', 'PM22', 'PM23', 'PM24', 'PM25',
'PM31' ) )
Upvotes: 2
Reputation: 5301
You can start your regex with a positive lookahead assertion that matches the whole line if it is longer than 50 characters, then add a negative lookbehind that makes sure there are less than 50 characters before the comma you want to match:
(?=.{50})(.*)(?<!.{50})(,)
Then you have found the comma that you want to split at or e.g. replace with a comma and a newline.
Full example:
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = @"(?=.{50})(.*)(?<!.{50})(,)";
string replacement = "$1,\n";
List<string> inputs = new List<string>();
inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )"); // shorter than 50 chars
inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )");
inputs.Add("AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )");
inputs.Add("AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )"); // first comma appearing later than character 50
foreach (string input in inputs)
{
string result = Regex.Replace(input, pattern, replacement);
Console.WriteLine(result);
}
}
}
Note that this has some limitations:
The last point can be addressed by capturing the remainder in a capture group and using a recursion to apply the regex again should the remainder be longer than 50 characters:
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
List<string> inputs = new List<string>();
inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )"); // shorter than 50 chars
inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10', 'PM99', 'PM59' ) )");
inputs.Add("AND ( AUART IN ( 'PM01', 'PM01232132132', 'PM03' ) )");
inputs.Add("AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )"); // first comma appearing later than character 50
inputs.Add("AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04', 'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16', 'PM21', 'PM22', 'PM23', 'PM24', 'PM25', 'PM31' ) )"); // string longer than 100 chars, i.e. the remainder needs to be processed again
List<string> results = new List<string>();
var regex = new Regex(@"((?=.{50}).*(?<!.{50}),)(.*)");
foreach (string input in inputs)
{
string str = input;
while(!String.IsNullOrEmpty(str)) {
var match = regex.Match(str);
if (match.Success) {
results.Add(match.Groups[1].Value);
str = match.Groups[2].Value;
} else {
results.Add(str);
break;
}
}
}
Console.WriteLine(String.Join("\n", results));
}
}
Result:
AND ( AUART IN ( 'PM01', 'PM02', 'PM03' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM10',
'PM99', 'PM59' ) )
AND ( AUART IN ( 'PM01', 'PM01232132132',
'PM03' ) )
AND ( AUART IN ( 'PM0654654654654654654654654654651', 'PM02' ) )
AND ( AUART IN ( 'PM01', 'PM02', 'PM03', 'PM04',
'PM11', 'PM12', 'PM13', 'PM14', 'PM15', 'PM16',
'PM21', 'PM22', 'PM23', 'PM24', 'PM25',
'PM31' ) )
To address the second point, you may consider breaking at commas ,
and whitespace \s
instead of only commas (if that is possible in your application scenario). The regex for that would be ((?=.{50}).*(?<!.{50})[\s,])(.*)
.
Upvotes: 1