Reputation: 81493
I require the following
Put simply, this is just a basic sentence tidy (in regards to commas and spaces).
The current solution I have is working, though I'm wondering if there is a way to reduce seemingly redundant steps with smarter "regex" expressions
Current solution
[TestCase(" , aaa,bbb ,, , ccc, ddd,, eee fff , , ggg , hhh ,", ExpectedResult = "aaa, bbb, ccc, ddd, eee fff, ggg, hhh")]
[TestCase(",, aaa,bbb ,, , ccc, ddd,, eee fff , , ggg , hhh ,, ", ExpectedResult = "aaa, bbb, ccc, ddd, eee fff, ggg, hhh")]
[TestCase(",, ,,", ExpectedResult = "")]
public string CleanSentence(string source)
{
var duplicateSpaces = new Regex(@"[ ]{2,}", RegexOptions.None);
var spacesBeforeCommas = new Regex(@"\s+(?=,)", RegexOptions.None);
var duplicateCommas = new Regex(@"[,]{2,}", RegexOptions.None);
var loneComma = new Regex(@",(?=[^\s])", RegexOptions.None);
var multiCommaAndSpace = new Regex(@"(, ){2,}", RegexOptions.None);
source = duplicateSpaces.Replace(source, " ");
source = duplicateCommas.Replace(source, ",");
source = spacesBeforeCommas.Replace(source, "");
source = loneComma.Replace(source, ", ");
source = multiCommaAndSpace.Replace(source, ", ");
//Trim the crud
source = source.Trim(',', ' ');
return source;
}
Test cases
var test1 = " , aaa,bbb ,, , ccc, ddd,, eee fff , , ggg , hhh ,"
var test2 = ",, aaa,bbb ,, , ccc, ddd,, eee fff , , ggg , hhh ,, "
var test3 = ",, ,,"
Intended results
var Result1 = "aaa, bbb, ccc, ddd, eee fff, ggg, hhh"
var Result2 = "aaa, bbb, ccc, ddd, eee fff, ggg, hhh"
var Result3 = ""
Though I'm wondering if there is a way to remove a couple of redundant steps.
Note: this is a quantifiable question, namely to reduce the steps involved with smarter regex expressions.
Upvotes: 1
Views: 1376
Reputation: 81493
I managed to get it down to with some inspiration from John Woo
source = Regex.Replace(source, "[ ]{2,}", " ");
source = Regex.Replace(source, "[, ]*,[, ]*", ", ");
return source.Trim(',', ' ');
Upvotes: 0
Reputation: 22866
Seems like just splitting by space and comma should be enough:
public string CleanSentence(string source)
{
return string.Join(", ", (source ?? "").Split(new[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries));
}
Upvotes: 0
Reputation: 1
I would suggest the following:
\s+\b
with \s
[,\s]*,
with ,
This will also remove spaces at the end of each string.
Hope this helps.
Upvotes: 0
Reputation: 263693
I have another solution just by using only string
built-in function and a little Regex.Replace
.
public string CleanString(string rawString)
{
if (string.IsNullOrWhiteSpace(rawString)) return rawString;
rawString = Regex.Replace(rawString, @"\s+", " ");
rawString = Regex.Replace(rawString, @"(?<=,)\s+|\s+(?=,)", "");
return string.Join(", ", rawString.Trim().Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)).Trim();
}
Upvotes: 2