OldBuildingAndLoan
OldBuildingAndLoan

Reputation: 3022

vim regex search csv string and paste matches

edit:

I need advice on best way to search with regex in vim and extract any matches that are discovered.


I have a csv file that looks something like this:

Two fields:


0g98932,"long description sometimes containing numbers like 1234567, or 0000012345 and even BR00012345 but always containing text"

I need to search the description field on each row. If a number matching \d{10} exists in the second field, I want to pull it out.

doing something like :% s/(\d{10})/^$1/g gives me a

Pattern not found (\d{10}) error.

I've never learned how to grab and reference a match from a regex search in vim - so that's part of the problem.

The other part:

I would really like to either.

  1. Delete everything other than the first 7 digit id and the matches.
  2. Copy the id and the matches to another file - or to the top of the current file (somewhere - anywhere just to separate the matches from the unfiltered data).

Upvotes: 2

Views: 3702

Answers (2)

rampion
rampion

Reputation: 89073

The important thing to know about vim regexes is that different levels are escaping are required (as opposed to, say, regexes in Perl or Ruby)

From :help /\m

after:    \v     \m       \M        \V    matches
                 'magic'  'nomagic'
          $      $        $         \$    matches end-of-line
          .      .        \.        \.    matches any character
          *      *        \*        \*    any number of the previous atom
          ()     \(\)     \(\)      \(\)  grouping into an atom
          |      \|       \|        \|    separating alternatives
          \a     \a       \a        \a    alphabetic character
          \\     \\       \\        \\    literal backslash
          \.     \.       .         .     literal dot
          \{     {        {         {     literal '{'
          a      a        a         a     literal 'a'

The default setting is 'magic', so to make the regex you gave worked, you'd have to use:

:%s/".*\(\d\{10}\).*"/\1/

If you want to delete everything other than the first 7 digit id and the matches (by which I assume you mean that you want to delete lines without any match)

:v/^\([[:alnum:]]\{7}\),\s*".*\(\d\{10}\).*/d
:%s//\1,\2/

The :v/<pattern>/ command allows you to run a command on each line that doesn't match the given pattern, so this just deletes the non-matches. :s// reuses the prior pattern, so we don't have to specify it.

This transforms the following:

0g98932,"long description sometimes containing numbers like 0123456789"
0g98932,"long description no numbers"
0g98932,"long description no numbers"
0g98932,"long description sometimes containing numbers like 0123456789"
0g98932,"long description no numbers"
0g98932,"long description no numbers"
0g98932,"long description no numbers"
0g98932,"long description no numbers"
0g98932,"long description sometimes containing numbers like 0123456789"
0g98932,"long description no numbers"
0g98932,"long description no numbers"
0g98932,"long description sometimes containing numbers like 0123456789"

into this:

0g98932,0123456789
0g98932,0123456789
0g98932,0123456789
0g98932,0123456789

Upvotes: 6

Mykola Golubyev
Mykola Golubyev

Reputation: 59844

To grab match you have to use

\(pattern\)

To delete use

:%s/not_pattern\(pattern\)another_not_pattern/\1/

Upvotes: 3

Related Questions