Reputation: 53
I'm looking to use vim to extract only the square brackets and the number inside from a file containing the following example text:
13_[4]_3_[4]_[1]_5_[1]_29_[3]_4_[2]_9_[1]_6_[2]_4
14_[4]_28_[3]_4_[2]_12_[1]_8_[2]_2
[1]_[4]_15_[1]_16_[3]_4_[2]_11_[1]_16_[2]_2
9_[4]_3_[4]_3_[4]_9_[4]_4_[4]_7_[1]_12_[3]_4_[2]_9_[1]_[2]_2
14_[4]_30_[3]_4_[2]_5_[1]_19_[1]_3_[1]_8_[2]_10_[1]_4_[1]_3_[1]_2
So for the first example line I would like an output line that looks like: [4][4][1][1][3][2][1][2].
I can easily delete the square brackets with:
:%s/\[\d\]//g
but I am having real trouble trying to delete all text that doesn't match [/d]. Most vim commands that work with negation (e.g. :v) appear to only operate on the whole line rather than individual strings, and using %s with group matching:
:%s/\v(.*)([\d])(.*)/\2
also matches and deletes the square brackets.
Would someone have a suggestion to solve my problem?
Upvotes: 5
Views: 2711
Reputation: 45087
You were close. You need to quote the square brackets and use something far less greedy than .*
.
:%s/\v[^[]*(\[\d\])[^[]*/\1/g
Match leading text + [
+ digit + ]
+ trailing text. Capturing the [
+ digit + ]
. Replace the match the capture group. Leaving only the brackets and digits.
\v
for very magic. See :h magic
[...]
is a bracketed character classes which matches any of the characters inside. e.g. fooba[rs]
matches foobar
and foobas
, but not foobaz
. See :h /\[
. (Note Vim may call this this a collection.)[^...]
is an negated bracketed character class, so matches none of the charcters inside the brackets. e.g. fooba[^rz]
matches foobas
, but not foobaz
and foobar
.[^[]
- match any non-[
character. (This looks funny)[^[]*
- match are non-[
character zero or more times. This will match the leading text we want to remove.(...)
- capture group\[
& \]
represent literal [
/ ]
. We must escape to prevent a character class.\d
match 1 digit.[^[]*
- match trailing text to be removed\1
the replacement will be our capture group aka bracketed digits.g
flag to do this globally or more plainly multiple times.%
to do a substitution, :s
, over the entire file, 1,$
.:%s/\v(.*)([\d])(.*)/\2
fail?tl;dr: Your pattern doesn't match. Try /[\d]
.
Long version:
.*
will capture too much leaving only the last portion. e.g. [2]...
.[\d]
creates a bracketed character class that matches one of the following characters: d
or \
.*
suffers from the same problem as the first when using the g
flag.g
flag. This means the command will only do 1 substitution per line which will leave plenty of text.When working with a tricky regex pattern it is often best to start with a search, /
, instead of a substitution. This allows you to see where the matches are beforehand. You can tweak your search via /
and pressing <up>
or <c-p>
. Or even better use q/
to open the command-line-window
so you edit your pattern like editing any text. You can also use <c-f>
while on the command line (including /
) to bring up the command-line-window
.
Once you have your pattern then you want to start your substitution. Vim provides a shortcut for using the current search by using an empty pattern. e.g :%s//\1/g
.
This technique especially combined with set incsearch
and set hlsearch
, means you can see your matches interactively before you do your substitutions. This technique is shown in the following Vimcast episode: Refining search patterns with the command-line window.
Need to learn more regex syntax? See :h pattern
. It is a very long and dense read, but will greatly aid you in the future. I also find reading Perl's regex documentation via perldoc perlre
to be a good place to look as well. Note: Perl's regexes are different from Vim's regexes (See :h perl-patterns
), but Perl Compatible Regular Expressions (PCRE) are very common.
You may also consider grep -o
. e.g. %!grep -o '\[\d\]'
.
:h :s
:h range
:h magic
:h /\[
:h /\(
:h s/\1
:h /\d
:h :s_flags
:h 'hlsearch'
:h 'incsearch'
:h q/
:h command-line-window
:h :range!
Upvotes: 4