Reputation: 88197
How can I have a regular expression that tests for spaces or tabs, but not newlines?
I tried \s
, but I found out that it tests for newlines too.
I use C# (.NET) and WPF, but it shouldn't matter.
Upvotes: 145
Views: 319059
Reputation: 139451
Your grammar teacher was likely not a programmer, so use a double-negative:
[^\S\r\n]
That is, not-not-whitespace (the capital S
complements) or not-carriage-return or not-newline. Distributing the outer not (i.e., the complementing ^
in the character class) with De Morgan’s law, this is equivalent to “whitespace but not carriage return or newline.” Including both \r
and \n
in the pattern correctly handles all of Unix (LF), classic Mac OS (CR), and DOS-ish (CR LF) newline conventions.
If you’re using PCRE, other options are available.
Upvotes: 1
Reputation: 1180
Note: For those dealing with CJK text (Chinese, Japanese, and Korean), the double-byte space (Unicode \u3000
) is not included in \s
for any implementation I've tried so far (Perl, .NET, PCRE, and Python). You'll need to either normalize your strings first (such as by replacing all \u3000
with \u0020
), or you'll have to use a character set that includes this code point in addition to whatever other white space you're targeting, such as [ \t\u3000]
.
If you're using Perl or PCRE, you have the option of using the \h
shorthand for horizontal whitespace, which appears to include the single-byte space, double-byte space, and tab, among others. See the Match whitespace but not newlines (Perl) question for more detail.
However, this \h
shorthand has not been implemented for .NET and C#, as best I've been able to tell.
Upvotes: 7
Reputation: 5235
As Eiríkr Útlendi noted, the accepted solution only considers two white space characters: the horizontal tab (U+0009), and a breaking space (U+0020). It does not consider other white space characters such as non-breaking spaces (which happen to be in the text I am trying to deal with).
A more complete white space character listing is included on Wikipedia and also referenced in the linked Perl answer. A simple C# solution that accounts for these other characters can be built using character class subtraction:
[\s-[\r\n]]
Or, including Eiríkr Útlendi's solution, you get
[\s\u3000-[\r\n]]
Upvotes: 21
Reputation: 583
If you want to replace space, the below code worked for me in C#.
Regex.Replace(Line, "\\\s", "");
For Tab
Regex.Replace(Line, "\\\s\\\s", "");
Upvotes: 0
Reputation: 655229
Try this character set:
[ \t]
This does only match a space or a tabulator.
Upvotes: 54