Jackson Ray Hamilton
Jackson Ray Hamilton

Reputation: 9466

Why do certain characters need to be escaped in this Elixir regular expression?

My ultimate goal is to write a function in JavaScript which will escape all regex metacharacters in Erlang. Because I want to construct a Mango $regex query for CouchDB 2 via my HTML5 application using PouchDB and pouchdb-find. I want to perform a search for a substring in a field on the objects in my database, without going to the trouble of setting up couchdb-lucene if I can help it and if that tool isn't needed.

In writing this escaping function, I found that Elixir has already written one.

{:ok, pattern} = :re.compile(~S"[.^$*+?()\[\]{}\\\|\s#-]", [:unicode])
@escape_pattern pattern

@spec escape(String.t) :: String.t
def escape(string) when is_binary(string) do
  :re.replace(string, @escape_pattern, "\\\\&", [:global, {:return, :binary}])
end

I am trying to figure out how to translate this expression to JavaScript, and in that process, I am trying to understand Elixir's and Erlang's regular expression syntax, which I understand to be based off PCRE.

Escaping the [ and ] characters makes enough sense, since they are inside a bracketed expression themselves. As does \, since it's an escape character.

But why are \| and \s being escaped?

Upvotes: 0

Views: 1173

Answers (1)

Jackson Ray Hamilton
Jackson Ray Hamilton

Reputation: 9466

As Lucas Trzesniewski and Dogbert have deduced in the comments, \| does not need to be escaped, and \s is escaped because if the Regex has the x flag, any unescaped whitespace is ignored, so escaping the space will always have a valid regex not dependent on whether the x flag is present or not: {"a b" =~ ~r/a b/, "a b" =~ ~r/a b/x, "a b" =~ ~r/a\ b/x} #=> {true, false, true}

Here's the escaping function I ended up with:

function escapeRegex (string) {
  return string.replace(/([.^$*+?()\[\]{}\\\s#-])/g, '\\$&');
}

Upvotes: 0

Related Questions