Reputation: 283

vim regex match with square brackets not working

Using vim, I am trying to convert the following two lines

  output reg [1:0] abcd,
  output reg efgh,

into

abcd
efgh

I am using the regular expression,

:%s/\voutput|reg|\s*|\[.*\]|,//g

But, I am getting the output as,

[1:0]abcd,
efgh,

Appreciate any help! Thanks.

Upvotes: 5

Answers (6)

jthill

Reputation: 60275

:help pattern gives the reason (although it helps, a lot, to have guessed the reason from prior exposure to the different possibilities :-)

1. A pattern is one or more branches, separated by "\|". It matches anything that matches one of the branches. Example: "foo\|beep" matches "foo" and matches "beep".If more than one branch matches, the first one is used.

Vim's regex matcher is a first-match engine. POSIX mandates leftmost-longest. Purists might argue that anything else isn't a regex matcher at all, but only a "pattern matcher", which may have something to do with vim calling them "patterns" ... sed and perl are leftmost-longest:

$ sed -r 's/output|reg|\s*|\[.*\]|,//g' @@
abcd
efgh

$ perl -ple 's/output|reg|\s*|\[.*\]|,//g' @@
abcd
efgh

but with a first-match engine you have to do things a little differently. Reorder your alternatives, it works:

:%s/\voutput|reg|\[.*\]|,|\s*//g

And replacing \s* with \s+ makes it insensitive to order:

:%s/\voutput|reg|\s+|\[.*\]|,//g

Vim's g flag seems to replace every occurrence of just the first matching branch and then retry, until nothing changes.

Just to be complete and confusing,

:%s/\v(reg|output|\s*|\[.*\]|,)*//

abcd,
efgh,

and

:%s/\v(reg|output|\s*|\[.*\]|,)*//g

abcd
efgh

which for a brief moment actually made sense to me given the rules deduced above.

(edit: gawks gensub and nvis extended engine are also apparently leftmost-longest)

Upvotes: 5

cgledezma

Reputation: 622

The problem in your regexp is the part where you ask for \s*. This means litterally "none or many blank spaces". And since the whole regexp is one big OR, then Vim will start consuming your string until it finds at least one blank space. When this happens, it'll start matching from the beginning of the OR again and repeat the process. So, this means that any expression you are trying to receive after the \s* will be ignored, since \s* can consume as much as it wants until it finds a blank character. To verify this, note that if you change the position of the \s* you will get different results, which translate in eliminating only the expressions before the \s*.

I believe the regexp you actually wanted was:

:%s/\voutput|reg|\s+|\[.*\]|,//g

To indicate that you want to replace places where there is at least a whitespace. This worked fine for me.

Upvotes: 1

FDinoff

Reputation: 31429

Reason why your regex didn't work.

It seems vim reads the regex left to right and tries to match each section of the union in order.

So output|reg|\s*|\[.*\]|, the \[.*\] is never reached because the empty string matches \s* which is between every character. Since the vim regex engine matched something it immediately does the replace.

If you just reorder the unions of \s* is last the regex works as expected.

So the command should be :%s/\voutput|reg|\[.*\]|,|\s*//g

Upvotes: 1