Eleno
Eleno

Reputation: 3016

Emacs Lisp: can the same regexp match two different patterns with same number of groupings?

I've started writing Emacs scripts according to directions given at http://www.emacswiki.org/emacs/EmacsScripts, which basically say that your scripts should start with:

:;exec emacs --script "$0" $@ 

Now I'd like to customize auto-mode-interpreter-regexp' accordingly, to make Emacs scripts automatically loaded withemacs-lisp-mode'.

The original `auto-mode-interpreter-regexp' was meant to match:

#! /bin/bash
#! /usr/bin/env perl

and so on, and thus it was this one:

"\\(?:#![   ]?\\([^     \n]*/bin/env[   ]\\)?\\([^  \n]+\\)\\)"

I tried adding the new regexp as an alternative:

(setq auto-mode-interpreter-regexp
   (concat ;; match "#! /bin/bash", "#! /usr/bin/env perl", etc.
           "\\(?:#![    ]?\\([^     \n]*/bin/env[   ]\\)?\\([^  \n]+\\)\\)"
           ;; or
           "\\|"
           ;; match ":;exec emacs "
           "\\(?::;[    ]?\\(exec\\)[   ]+\\([^     \n]+\\)[    ]*\\)"))

but this one, while matching the whole string, failed to capture its submatches, especially the second one which is needed to detect the interpreter. Thus, I've mixed the regexp to match both headers at the same time:

(setq auto-mode-interpreter-regexp
    (concat ;; match "#!" or ":;"
            "\\(?:#!\\|:;\\)"
            ;; optional spaces
            "[  ]?"
            ;; match "/bin/bash", "/usr/bin/env" or "exec" 
            "\\(\\[^    \n]*/bin/env[   ]\\|exec[   ]\\)?"
            ;; match interpreter
            "\\([^  \n]+\\)"))

Could I have done better? Thank you.

Upvotes: 7

Views: 655

Answers (2)

Thomas
Thomas

Reputation: 17422

The groupings of a regexp are defined by the parentheses that appear in it. That's why the second of your three regexps matches but cannot be used in this case: "exec" and "emacs" are captured in groups 3 and 4 respectively, but auto-mode-interpreter-regexp expects the name of the script interpreter to be in group 2.

(EDIT: What I've written above is wrong, except for the relevance of group 2 for auto-mode-interpreter-regexp. See huaiyuan's answer for insights.)

From the documentation of said variable:

Regexp matching interpreters, for file mode determination. This regular expression is matched against the first line of a file to determine the file's mode in `set-auto-mode'. If it matches, the file is assumed to be interpreted by the interpreter matched by the second group of the regular expression.

I think your final solution looks pretty good though. Two comments:

  1. The original regexp is wrapped in \\(?:...\\). This has no influence on the match per se, but if you use it in combination with other regexps it could be helpful in cases where you append a postfix operator:

    (setq my-regexp (concat auto-mode-interpreter-regexp "?"))

  2. Your regexp now also matches stuff like #!exec..., which may not be a problem. This arises because you factored out the shebang, which is the right thing to do as (match-string 1) is to match the (/usr)/bin/env or exec respectively, without including the shebang.

Upvotes: 1

huaiyuan
huaiyuan

Reputation: 26549

Regexp in Emacs supports the use of "explicitly numbered group" construct to assign a group number to any submatch. See Elisp Manual 34.3.1.3 Backslash Constructs in Regular Expressions.

The syntax is ‘(?num: ... )’, where num is the chosen group number.

Upvotes: 1

Related Questions