Reputation: 424
I have a string like the one below:
Ireland, UK, United States of America, Belgium, Germany , Some Country, ...
I need help with Regex
or String.Replace
function to remove extra spaces so that result would be like:
Ireland,UK,United States of America,Belgium,Germany,Same Country,
Thank you.
Upvotes: 1
Views: 116
Reputation:
Although the answer written by stribizhev is good for this situation, I want to take advantage from this opportunity to highlight the (negative) impact on performance associated with using regex for simple tasks.
ALTERNATIVE NOTABLY FASTER (x2) THAN REGEX (which is always pretty slow when dealing with these situations)
My approach is based on recursive removal of blank spaces. I created two versions: a first one with a conventional loop (withoutRegex
) and a second one relying on LINQ (withoutRegex2
; actually, it is identical to the stribizhev's answer except for the Regex
part).
Private Function withoutRegex(input As String) As String
Dim output As String = ""
Dim temp() = input.Split(","c)
For i As Integer = 0 To temp.Length - 1
output = output & recursiveSpaceRemoval(temp(i).Trim()) & If(i < temp.Length - 1, ",", "")
Next
Return output
End Function
Private Function withoutRegex2(input As String) As String
Return String.Join(",", _
input _
.Split(","c) _
.Select(Function(x) recursiveSpaceRemoval(x.Trim())) _
.ToArray())
End Function
Private Function recursiveSpaceRemoval(input As String) As String
Dim output As String = input.Replace(" ", " ")
If output = input Then Return output
Return recursiveSpaceRemoval(output)
End Function
To prove my point, I created the following testing framework:
Dim input As String = "Ireland, UK, United States of America, Belgium, Germany , Some Country"
Dim output As String = ""
Dim count As Integer = 0
Dim countMax As Integer = 20
Dim with0 As Long = 0
Dim without As Long = 0
Dim without2 As Long = 0
While count < countMax
count = count + 1
Dim sw As Stopwatch = New Stopwatch
sw.Start()
output = withRegex(input)
sw.Stop()
with0 = with0 + sw.ElapsedTicks
sw = New Stopwatch
sw.Start()
output = withoutRegex(input)
sw.Stop()
without = without + sw.ElapsedTicks
sw = New Stopwatch
sw.Start()
output = withoutRegex2(input)
sw.Stop()
without2 = without2 + sw.ElapsedTicks
End While
MessageBox.Show("With: " & with0.ToString)
MessageBox.Show("Without: " & without.ToString)
MessageBox.Show("Without 2: " & without2.ToString)
Where withRegex
refers to the stribizhev's answer, that is:
Private Function withRegex(input As String) As String
Return String.Join(",", _
input _
.Split(","c) _
.Select(Function(m) Regex.Replace(m.Trim(), "\p{Zs}{2,}", " ")) _
.ToArray())
End Function
This is a simplistic testing framework which analyses very quick actions, where every single bit matters (the point of the 20 loop iterations is precisely trying to improve the reliability of the measurements a bit). That is: the results are affected even by changing the order in which the methods are being called.
In any case, the differences among methods have remained more or less consistent across all my tests. The average values I got after some tests are:
With: 2500-2700
Without: 1100-1300
Without2: 900-1200
NOTE: as far as this is a generic critic to regex's performance (at least, in simple enough situations which might be easily replaced with alternatives on the lines of what I am showing here), any advice about how to improve it (regex's performance) in .NET will be more than welcome. But please, avoid generic unclear statements and be as specific as possible (e.g., by suggesting changes in the proposed testing framework).
Upvotes: 2
Reputation: 626728
You can achieve that by splitting the input with a comma, then trimming and shrinking multiple spaces into 1, and String.Join
ing back.
Just showing how it can be done using LINQ:
Console.Write(String.Join(",", _
"Ireland, UK, United States of America, Belgium, Germany , Some Country," _
.Split(","c) _
.Select(Function(m) Regex.Replace(m.Trim(), "\p{Zs}{2,}", " ")) _
.ToArray()))
The key thing is Regex.Replace(m.Trim(), "\p{Zs}{2,}", " ")
where multiple spaces are shrunk into 1.
Result: Ireland,UK,United States of America,Belgium,Germany,Some Country,
Upvotes: 4