Widunder
Widunder

Reputation: 295

C# Replacing Multiple Spaces with 1 space leaving special characters intact

Having a bit of a problem as I have to translate a string into a table. I'd like to remove multiple spaces, but not all of them. So the data in text comes back with lots of spaces in between like so:

 SESSIONNAME       USERNAME                 ID  STATE   TYPE        DEVICE\r\n 
 services                                    0  Disc                      \r\n 
 console                                     1  Conn                      \r\n 
                   alinav                    2  Disc                      \r\n  
 rdp-tcp                                 65536  Listen                    \r\n  

I would like to still keep the \r\n\ values that will define my rows, and I want to keep the empty value which would be legit under the columns, and I want to keep the spaces to define the columns. But I want to remove the extra spaces that I don't want to be fed into the values.

I've tried:

output = Regex.Replace(output, @"\s{2,}", " ", RegexOptions.Multiline);

output = output.Replace("  ", " ");

But the first one just removes everything (things I need and don't need). And the second one still leaves too many spaces.

Thanks.

Upvotes: 1

Views: 929

Answers (2)

Dour High Arch
Dour High Arch

Reputation: 21722

In your example the data is delimited by position, not by characters; is that correct? If so, you should extract by position; something like:

foreach (string s in output.Split())
{
    var sessionName = s.Substring(0, 18).Trim();
    var userName = s.Substring(18, 19).Trim();
    var id = Int32.Parse(s.Substring(37, 8).Trim());
    var whateverType = s.Substring(45, 12).Trim();
    var device = s.Substring(57, 6).Trim();
}

Of course you need to do proper error checking, and should probably put the field widths in an array and calculate positions instead of hard-coding them as I have shown.

Upvotes: 2

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477607

You can do two things:

Use space explicitly in the regular expression, \s includes weird characters like (\n, \r, \t,...) as well, thus:

output = Regex.Replace(output, @" +", " ", RegexOptions.Multiline);

Or apply the second method until convergence:

string s2 = output;
do {
    output = s2;
    s2 = s2.Replace("  "," ");
} while(output != s2);

In most cases the first method will outperform the second one. This because the first method groups all substrings with two or more spaces. Regexes are in general a bit slower than simple string replacement, but if the string contains sequences with many spaces, the first method will be faster.

Upvotes: 3

Related Questions