user2550098
user2550098

Reputation: 193

escaping \n in \s match in reg ex python

I want to substitute all space characters (except \n) with "". I tried using regular expression with \s+ but it matches with newline character as well.

Is there any method to skip \n in \s in regex?

Upvotes: 5

Views: 2505

Answers (5)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

If you do not have to think of Unicode, you could use

[ \t\r\f\v]

Or, since \v matches a VT (verical symbol, \x0b), \r is also considered a line break, and \f is also a kind of a vertical whitespace (rather obsolete now though - (form feed, \x0c):

[ \t]

See docs:

\s
When the UNICODE flag is not specified, it matches any whitespace character, this is equivalent to the set [ \t\n\r\f\v]. The LOCALE flag has no extra effect on matching of the space. If UNICODE is set, this will match the characters [ \t\n\r\f\v] plus whatever is classified as space in the Unicode character properties database.

If you need to support all Unicode spaces, use

\s(?<!\n)

This expression will match any whitespace that is not a line feed.

See the regex demo

Another example of how to add a restriction to a positive shorthand character class, is using its opposite inside a negated character class. \S is the opposite shorthand character class for \s, thus, we should put it into [^...] and *add the character from \s that we need to exclude:

[^\S\n]

Add \r, \v, etc. if you need to exclude all line breaks. The [^\S\n] matches any character other than a non-whitespace (=matches any whitespace) and a line feed character.

Upvotes: 7

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You can use the negated character class [^\S\n] where \S is all that is not a whitespace:

re.sub(r'[^\S\n]', '', s)

Upvotes: 1

piglei
piglei

Reputation: 1208

It's said in the document that \s matches [ \t\n\r\f\v]. So you just need to replace '\s+' to [ \t\r\f\v]+ in order to skip \n.

Upvotes: 2

Avinash Raj
Avinash Raj

Reputation: 174696

Is there any method to skip \n in \s in regex?

You may use negative lookahead.

re.sub(r'(?!\n)\s', '', s)

If you also want to skip carriage return then add \r inside the negative lookahead.

re.sub(r'(?!\n|\r)\s', '', s)

It's like a kind of subtraction. ie, above regex would subtract \n, \r from \s

Upvotes: 0

Maroun
Maroun

Reputation: 95958

\s matches [\r\n\t\f ], if you want only spaces you can use the following:

>>> re.sub(' ', '', 'test   string\nwith  new line')

Since ' ' matches a space (literally), this will remove all spaces but will keep the \n character.

Upvotes: 0

Related Questions