sclarke81
sclarke81

Reputation: 1769

Regex to remove trailing whitespace and multiple blank lines

I'd like regex expressions to use in a Visual Studio 2013 extension written in C#.

I'm trying to remove trailing whitespaces from a line while preserving empty lines. I'd also like to remove multiple empty lines. The existing line endings should be preserved (generally carriage return line feed).

So the following text (spaces shown as underscores):

hello_world__


___hello_world_
__
__
hello_world

Would become:

hello_world

___hello_world

hello_world

I've tried a number of different patterns to remove the trailing spaces but I either end up not matching the trailing spaces or losing the carriage returns. I haven't yet tried to remove the multiple empty lines.

Here's a couple of the patterns I've tried so far:

\s+$

(?<=\S)\s+$

Upvotes: 3

Views: 2930

Answers (6)

user1945782
user1945782

Reputation:

Just as a punt, without the use of Regex, you could always split the document by its end of line marker and then feedback using TrimEnd (as highlighted by Anton Semenov)...

(Assuming a text document read into a string...)

//  Ascertain the linefeed...
string str = "This is a test    \r\nto see if I can force   \ra string to be broken \non multiple lines           \r\n into an array.";
string[] t = str.Split(new string[] { "\r\n", "\r", "\n" } ,StringSplitOptions.RemoveEmptyEntries);
thediv.InnerHtml = str + "<br /><br />";
foreach(string s in t)
{
    thediv.InnerHtml += s.TrimEnd() + "<br />";
}

I haven't timed this at all, but if you prefer to avoid the complications of Regex (which I do if I can - see below*), you should find this fast enough to do what you want.

* I avoid Regex if I can. That doesn't mean that I don't use it. Regex has its place, but I believe it to be a last resort tool for involved jobs, for instance complex flexible strings that adhere to a format - something where the alternative will generate large amounts of code. Keeping Regex to an absolute minimum aids the readability of your code.

Upvotes: 1

MBaas
MBaas

Reputation: 7530

The \s includes the linefeed, I would search for just multiple blanks instead. I do not know the specifics of VS, but this should hopefully do it:

[" "]*?$

Upvotes: 0

sclarke81
sclarke81

Reputation: 1769

Thanks for the answers so far. None of them are quite right for what I need, but they've helped me come up with what I needed. I think the issue is that are some oddities with regex in VS2013 (see Using Regular Expressions in Visual Studio). These two operations work for me:

Replace \ +(?=(\n|\r?$)) with nothing.

Replace ^\r?$(\n|\r\n){2,} with \r\n.

Upvotes: 3

zolo
zolo

Reputation: 469

\ +(?=(\n|$))

Any number of space, and checking that after a newline coming OR end of line (last characters in your string/text). (of course multi line needs to be enabled and global mode)

Upvotes: 1

user557597
user557597

Reputation:

As separate operations -

Remove trailing whitespace any (?m)[^\S\r\n]+$
Remove trailing whitespace lines with text (?m)(?<=\S)[^\S\r\n]+$

Remove duplicate blank lines (along with whitespace trim)

    # Find: (?>\A(?:[^\S\r\n]*\r\n)+)|(?>\r\n(?:[^\S\r\n]*(\r\n)){2,})
    # Replace: $1\r\n


    (?>
         \A 
         (?: [^\S\r\n]* \r \n )+
    )
 |  
    (?>
         \r \n 
         (?:
              [^\S\r\n]* 
              ( \r \n )                     # (1)
         ){2,}
    )

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626851

To remove multiple blank lines and trailing whitespace with

(?:\r\n[\s-[\rn]]*){3,}

and replace with \r\n\r\n.

See demo

And to remove the remaining whitespace, you can use

(?m)[\s-[\r]]+\r?$

See demo 2

Upvotes: 1

Related Questions