Reputation: 283
Using vim, I am trying to convert the following two lines
output reg [1:0] abcd,
output reg efgh,
into
abcd
efgh
I am using the regular expression,
:%s/\voutput|reg|\s*|\[.*\]|,//g
But, I am getting the output as,
[1:0]abcd,
efgh,
Appreciate any help! Thanks.
Upvotes: 5
Views: 4813
Reputation: 60275
:help pattern
gives the reason (although it helps, a lot, to have guessed the reason from prior exposure to the different possibilities :-)
1. A pattern is one or more branches, separated by "\|". It matches anything
that matches one of the branches. Example: "foo\|beep" matches "foo" and
matches "beep".
If more than one branch matches, the first one is used.
Vim's regex matcher is a first-match engine. POSIX mandates leftmost-longest. Purists might argue that anything else isn't a regex matcher at all, but only a "pattern matcher", which may have something to do with vim calling them "patterns" ... sed
and perl
are leftmost-longest:
$ sed -r 's/output|reg|\s*|\[.*\]|,//g' @@
abcd
efgh
$ perl -ple 's/output|reg|\s*|\[.*\]|,//g' @@
abcd
efgh
but with a first-match engine you have to do things a little differently. Reorder your alternatives, it works:
:%s/\voutput|reg|\[.*\]|,|\s*//g
And replacing \s*
with \s+
makes it insensitive to order:
:%s/\voutput|reg|\s+|\[.*\]|,//g
Vim's g
flag seems to replace every occurrence of just the first matching branch and then retry, until nothing changes.
Just to be complete and confusing,
:%s/\v(reg|output|\s*|\[.*\]|,)*//
abcd,
efgh,
and
:%s/\v(reg|output|\s*|\[.*\]|,)*//g
abcd
efgh
which for a brief moment actually made sense to me given the rules deduced above.
(edit: gawk
s gensub
and nvi
s extended
engine are also apparently leftmost-longest)
Upvotes: 5
Reputation: 622
The problem in your regexp is the part where you ask for \s*. This means litterally "none or many blank spaces". And since the whole regexp is one big OR, then Vim will start consuming your string until it finds at least one blank space. When this happens, it'll start matching from the beginning of the OR again and repeat the process. So, this means that any expression you are trying to receive after the \s* will be ignored, since \s* can consume as much as it wants until it finds a blank character. To verify this, note that if you change the position of the \s* you will get different results, which translate in eliminating only the expressions before the \s*.
I believe the regexp you actually wanted was:
:%s/\voutput|reg|\s+|\[.*\]|,//g
To indicate that you want to replace places where there is at least a whitespace. This worked fine for me.
Upvotes: 1
Reputation: 31429
Reason why your regex didn't work.
It seems vim reads the regex left to right and tries to match each section of the union in order.
So output|reg|\s*|\[.*\]|,
the \[.*\]
is never reached because the empty string matches \s*
which is between every character. Since the vim regex engine matched something it immediately does the replace.
If you just reorder the unions of \s*
is last the regex works as expected.
So the command should be :%s/\voutput|reg|\[.*\]|,|\s*//g
Upvotes: 1
Reputation: 8819
This works (looks for 4 alphabetic near the end):
%s/^.*\<\(\a\{4}\),\s*$/\1/g
Upvotes: 0
Reputation: 195039
$xbd0
will do the job on one line. you could record a macro to do it on multiple lines automatically.
Upvotes: 3