max pleaner
max pleaner

Reputation: 26778

Regex to split on whitespace but not escaped whitespace

I want to split on standard whitespace " " but not escaped whitespace "\ "

For example, with the string 'my name\ is\ max' (single quotes so \ is literal)

I want to get ["my", "name\ is\ max"]

I've tried this regex: /[^\\]\s/

but the result is this:

=> ["m", "name\\ is\\ max"]

This is close but I don't know how to keep the y in my


edit

As another example consider this string:

"./db/users/WGDl-HATof-uhdtT7sPfog: [email protected] name=max\\ p"

I want to split it into three:

[
  "./db/users/WGDl-HATof-uhdtT7sPfog:",
  "[email protected]",
  "name=max\\ p"
]

Upvotes: 1

Views: 637

Answers (3)

akuhn
akuhn

Reputation: 27803

Try this

require 'shellwords'

'my name\ is\ max'.shellsplit
# => ["my", "name is max"]

No need for a regexp.

Upvotes: 2

Mosab Sasi
Mosab Sasi

Reputation: 1130

try this: "./db/users/WGDl-HATof-uhdtT7sPfog: [email protected] name=max\\ p".split(/(?<![\\])[\s](?![\\])/).

A break-down of the regex (?<![\\])[\s](?![\\]) :

(?<![\\]) This tells the regex engine to match a whitespace not preceded by a backslash "\" (escaped here with another backslash)

[\s] This is a character class for a 'space'

(?![\\]) This tells the regex engine to match a whitespace that is preceded by a backslash "\" (escaped here with another backslash)

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627103

Regarding

I'm trying to split on whitespace that is not preceeded by a backslash.

If you only care about backslash before whitespace and there are no other special cases to consider, use a negative lookbehind (?<!\\) before \s:

s.split(/(?<!\\)\s/)

Here, \s+ matches 1+ whitespaces if not preceded with a backslash ((?<!\\) is a negative lookbehind that checks if the text to the left of the current location matches the pattern, and if yes, the match is failed).

In case there are multiple whitespaces to consider, and in case there is need to deal with escape sequences, use

s.scan(/(?:[^\s\\]|\\.)+/) 

See the Ruby demo

Here, (?:[^\s\\]|\\.)+ matches 1 or more chars other than a backslash and whitespace ([^\s\\]) or any escape sequence (\\.). Add /m modifier to make . match line break chars, too.

Upvotes: 2

Related Questions