Globuous
Globuous

Reputation: 63

Regex order when matching single square bracket

Hello to all of you,

I have a question regarding a specific regex in Elisp and specifically in Elisp. I'm trying to match a single square bracket and ielm has this:

  (string-match "[\]\[]" "[")  ; ===> 0
  (string-match "[\[\]]" "[")  ; ===> nil

  (string-match "[\]\[]" "]")  ; ===> 0
  (string-match "[\[\]]" "]")  ; ===> nil

  (string-match "[\[\]]" "[]") ; ===> 0
  (string-match "[\]\[]" "[]") ; ===> 0
  (string-match "[\]\[]" "][") ; ===> 0
  (string-match "[\]\[]" "][") ; ===> 0

Where as with JS, these all return true:

'['.match(/[\[\]]/) // ===>['[']
'['.match(/[\]\[]/) // ===>['[']


']'.match(/[\[\]]/) // ===>[']']
']'.match(/[\]\[]/) // ===>[']']

'[]'.match(/[\[\]]/) // ===>['[']
'[]'.match(/[\]\[]/) // ===>['[']
']['.match(/[\[\]]/) // ===>[']']
']['.match(/[\]\[]/) // ===>[']']

Here's a regex101: https://regex101.com/r/e8sLXr/1

I don't understand why the order of my square brackets in Elisp matters. I've tried using double backslashes but it doesn't help. Actually, it gives me more nils on these regexes whereas I thought the proper way to escape a backslack in a string for the regex to process was to double it: https://www.gnu.org/software/emacs/manual/html_node/elisp/Regexp-Example.html#Regexp-Example

Does anyone know what I'm missing a could help me ?

Cheers,

Thomas

EDIT: grammar

Upvotes: 1

Views: 565

Answers (1)

phils
phils

Reputation: 73274

Firstly, let's ditch the backslashes. [ and ] are not special to strings(*), and therefore escaping them does not change them. So the following is equivalent, and easier to read:

(string-match "[][]" "[")  ; ===> 0
(string-match "[][]" "]")  ; ===> 0
(string-match "[][]" "[]") ; ===> 0
(string-match "[][]" "][") ; ===> 0
(string-match "[][]" "][") ; ===> 0

This pattern matches either ] or [, and all the strings being tested have one of those characters at the start; hence we match at position 0 in each case.

Critically, to include a ] in a character alternative it must be the first character. Hence the following did not do what you wanted:

(string-match "[[]]" "[")  ; ===> nil
(string-match "[[]]" "]")  ; ===> nil
(string-match "[[]]" "[]") ; ===> 0

This pattern matches exactly [], because [[] is a character alternative matching anything in the set comprising the single-character [; and that character alternative is then followed by ] (which, when it is not ending a character alternative, just matches itself).

You will want to read the "character alternative" details at:

C-hig (elisp)Regexp Special RET


(*) Note also that backslashes are not special to a regexp when they are within a character alternative.

Your regexps didn't have any backslashes -- because in double-quoted string format you would have needed to double the backslashes to include those in the regexp -- but if you had done that, and if they were also inside the character alternative, it would just mean that a backslash would be one of the characters matched by that set.

e.g. "[\\]\\[]" is the regexp [\]\[] which matches \[]

(Remembering that ] cannot appear in a character alternative unless it is the first character.)

Upvotes: 1

Related Questions