nsimplex
nsimplex

Reputation: 489

Emacs regex wordWord boundary (specifically concerning underscores)

I am trying to replace all occurrences of a whole word on emacs (say foo) using M-x replace-regexp.

The problem is that I don't want to replace occurrences of foo in underscored words such as word_foo_word

If I use \bfoo\b to match foo then it will match the underscored strings; because as I understand emacs considers underscores to be part of word boundaries, which is different to other RegEx systems such as Perl.

What would be the correct way to proceed?

Upvotes: 19

Views: 6062

Answers (2)

The regexp \<foo\> or \bfoo\b matches foo only when it's not preceded or followed by a word constituent character (syntax code w, usually alphanumerics, so it matches in foo_bar but not in foo1).

Since Emacs 22, the regexp \_<foo_bar\_> matches foo_bar only when it's not preceded or followed by a symbol-constituent character. Symbol constituents include not only word constituents (alphanumerics) but also punctuation characters that are allowed in identifiers, meaning underscores in most programming languages.

Upvotes: 15

Cheeso
Cheeso

Reputation: 192607

You wrote:

as I understand emacs considers underscores to be part of word boundaries, which is different to other regex systems

The treatment of underscores, like everything else in emacs, is configurable. This question:
How to make forward-word, backward-word, treat underscore as part of a word?

...asks the converse.

I think you could solve your problem by changing the syntax of underscores in the syntax table so that they are not part of words, and then doing the search/replace.

To do that, you need to know the mode you are using, and the name of the syntax table for that mode. In C++, it would be like this:

(modify-syntax-entry ?_ "." c++-mode-syntax-table)

The dot signifies "punctuation", which implies not part of a word. For more on that, try M-x describe-function on modify-syntax-entry.

Upvotes: 8

Related Questions