virrion
virrion

Reputation: 424

How to remove extra spaces from a specific string?

I have a string like the one below:

Ireland, UK, United States of America,     Belgium, Germany   , Some     Country, ...

I need help with Regex or String.Replace function to remove extra spaces so that result would be like:

Ireland,UK,United States of America,Belgium,Germany,Same Country,

Thank you.

Upvotes: 1

Views: 116

Answers (2)

user2480047
user2480047

Reputation:

Although the answer written by stribizhev is good for this situation, I want to take advantage from this opportunity to highlight the (negative) impact on performance associated with using regex for simple tasks.

ALTERNATIVE NOTABLY FASTER (x2) THAN REGEX (which is always pretty slow when dealing with these situations)

My approach is based on recursive removal of blank spaces. I created two versions: a first one with a conventional loop (withoutRegex) and a second one relying on LINQ (withoutRegex2; actually, it is identical to the stribizhev's answer except for the Regex part).

Private Function withoutRegex(input As String) As String

    Dim output As String = ""

    Dim temp() = input.Split(","c)
    For i As Integer = 0 To temp.Length - 1
        output = output & recursiveSpaceRemoval(temp(i).Trim()) & If(i < temp.Length - 1, ",", "")
    Next

    Return output

End Function

Private Function withoutRegex2(input As String) As String

    Return String.Join(",", _
    input _
    .Split(","c) _
    .Select(Function(x) recursiveSpaceRemoval(x.Trim())) _
    .ToArray())

End Function

Private Function recursiveSpaceRemoval(input As String) As String

    Dim output As String = input.Replace("  ", " ")

    If output = input Then Return output
    Return recursiveSpaceRemoval(output)

End Function

To prove my point, I created the following testing framework:

Dim input As String = "Ireland, UK, United States of America,     Belgium, Germany   , Some     Country"
Dim output As String = ""

Dim count As Integer = 0
Dim countMax As Integer = 20
Dim with0 As Long = 0
Dim without As Long = 0
Dim without2 As Long = 0

While count < countMax

    count = count + 1
    Dim sw As Stopwatch = New Stopwatch
    sw.Start()
    output = withRegex(input)
    sw.Stop()
    with0 = with0 + sw.ElapsedTicks

    sw = New Stopwatch
    sw.Start()
    output = withoutRegex(input)
    sw.Stop()
    without = without + sw.ElapsedTicks

    sw = New Stopwatch
    sw.Start()
    output = withoutRegex2(input)
    sw.Stop()
    without2 = without2 + sw.ElapsedTicks

End While

MessageBox.Show("With: " & with0.ToString)
MessageBox.Show("Without: " & without.ToString)
MessageBox.Show("Without 2: " & without2.ToString)

Where withRegex refers to the stribizhev's answer, that is:

Private Function withRegex(input As String) As String

    Return String.Join(",", _
    input _
    .Split(","c) _
    .Select(Function(m) Regex.Replace(m.Trim(), "\p{Zs}{2,}", " ")) _
    .ToArray())

End Function

This is a simplistic testing framework which analyses very quick actions, where every single bit matters (the point of the 20 loop iterations is precisely trying to improve the reliability of the measurements a bit). That is: the results are affected even by changing the order in which the methods are being called.

In any case, the differences among methods have remained more or less consistent across all my tests. The average values I got after some tests are:

With: 2500-2700
Without: 1100-1300
Without2: 900-1200

NOTE: as far as this is a generic critic to regex's performance (at least, in simple enough situations which might be easily replaced with alternatives on the lines of what I am showing here), any advice about how to improve it (regex's performance) in .NET will be more than welcome. But please, avoid generic unclear statements and be as specific as possible (e.g., by suggesting changes in the proposed testing framework).

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626728

You can achieve that by splitting the input with a comma, then trimming and shrinking multiple spaces into 1, and String.Joining back.

Just showing how it can be done using LINQ:

Console.Write(String.Join(",", _
    "Ireland, UK, United States of America,     Belgium, Germany   , Some     Country," _
     .Split(","c) _
     .Select(Function(m) Regex.Replace(m.Trim(), "\p{Zs}{2,}", " ")) _
     .ToArray()))

The key thing is Regex.Replace(m.Trim(), "\p{Zs}{2,}", " ") where multiple spaces are shrunk into 1.

Result: Ireland,UK,United States of America,Belgium,Germany,Some Country,

Upvotes: 4

Related Questions