Steffen
Steffen

Reputation: 53

Using vim to replace not matching strings that occur a variable number of times

I'm looking to use vim to extract only the square brackets and the number inside from a file containing the following example text:

13_[4]_3_[4]_[1]_5_[1]_29_[3]_4_[2]_9_[1]_6_[2]_4
14_[4]_28_[3]_4_[2]_12_[1]_8_[2]_2
[1]_[4]_15_[1]_16_[3]_4_[2]_11_[1]_16_[2]_2
9_[4]_3_[4]_3_[4]_9_[4]_4_[4]_7_[1]_12_[3]_4_[2]_9_[1]_[2]_2
14_[4]_30_[3]_4_[2]_5_[1]_19_[1]_3_[1]_8_[2]_10_[1]_4_[1]_3_[1]_2

So for the first example line I would like an output line that looks like: [4][4][1][1][3][2][1][2].

I can easily delete the square brackets with:

:%s/\[\d\]//g

but I am having real trouble trying to delete all text that doesn't match [/d]. Most vim commands that work with negation (e.g. :v) appear to only operate on the whole line rather than individual strings, and using %s with group matching:

:%s/\v(.*)([\d])(.*)/\2

also matches and deletes the square brackets.

Would someone have a suggestion to solve my problem?

Upvotes: 5

Views: 2711

Answers (2)

Peter Rincker
Peter Rincker

Reputation: 45087

You were close. You need to quote the square brackets and use something far less greedy than .*.

:%s/\v[^[]*(\[\d\])[^[]*/\1/g

Overview

Match leading text + [ + digit + ] + trailing text. Capturing the [ + digit + ]. Replace the match the capture group. Leaving only the brackets and digits.

Glory of details

  • Using \v for very magic. See :h magic
  • [...] is a bracketed character classes which matches any of the characters inside. e.g. fooba[rs] matches foobar and foobas, but not foobaz. See :h /\[. (Note Vim may call this this a collection.)
  • [^...] is an negated bracketed character class, so matches none of the charcters inside the brackets. e.g. fooba[^rz] matches foobas, but not foobaz and foobar.
  • [^[] - match any non-[ character. (This looks funny)
  • [^[]* - match are non-[ character zero or more times. This will match the leading text we want to remove.
  • (...) - capture group
  • \[ & \] represent literal [ / ]. We must escape to prevent a character class.
  • \d match 1 digit.
  • [^[]* - match trailing text to be removed
  • \1 the replacement will be our capture group aka bracketed digits.
  • Use the g flag to do this globally or more plainly multiple times.
  • Use a range of % to do a substitution, :s, over the entire file, 1,$.

So why does :%s/\v(.*)([\d])(.*)/\2 fail?

tl;dr: Your pattern doesn't match. Try /[\d].

Long version:

  • The first .* will capture too much leaving only the last portion. e.g. [2]....
  • [\d]creates a bracketed character class that matches one of the following characters: d or \
  • The second .* suffers from the same problem as the first when using the g flag.
  • Why not 3 capture groups? You can certainly have more capture groups, but in this case they unnecessary, so remove them.
  • Missing g flag. This means the command will only do 1 substitution per line which will leave plenty of text.

General regex and substitution advice

When working with a tricky regex pattern it is often best to start with a search, /, instead of a substitution. This allows you to see where the matches are beforehand. You can tweak your search via / and pressing <up> or <c-p>. Or even better use q/ to open the command-line-window so you edit your pattern like editing any text. You can also use <c-f> while on the command line (including /) to bring up the command-line-window.

Once you have your pattern then you want to start your substitution. Vim provides a shortcut for using the current search by using an empty pattern. e.g :%s//\1/g.

This technique especially combined with set incsearch and set hlsearch, means you can see your matches interactively before you do your substitutions. This technique is shown in the following Vimcast episode: Refining search patterns with the command-line window.

Need to learn more regex syntax? See :h pattern. It is a very long and dense read, but will greatly aid you in the future. I also find reading Perl's regex documentation via perldoc perlre to be a good place to look as well. Note: Perl's regexes are different from Vim's regexes (See :h perl-patterns), but Perl Compatible Regular Expressions (PCRE) are very common.

Thoughts

You may also consider grep -o. e.g. %!grep -o '\[\d\]'.

More help

:h :s
:h range
:h magic
:h /\[
:h /\(
:h s/\1
:h /\d
:h :s_flags
:h 'hlsearch'
:h 'incsearch'
:h q/
:h command-line-window
:h :range!

Upvotes: 4

Sato Katsura
Sato Katsura

Reputation: 3086

Another way to do it:

:%s/\v[^[]*(%(\[\d\])?)/\1/g

Upvotes: 1

Related Questions