Juan Luis Chulilla
Juan Luis Chulilla

Reputation: 368

How can I build a regexp in vim which searches for words which do not begin with "@"?

I would like to extract the usernames from a long text file built from Twitter posts. I have tried with expressions such as

:%s#\([^@].\{-}\) ##g
:%s#\(\<[^@].\{-}\>\) ##g

but it doesn't work. I read Vim's documentation for @, but, as far as I know, it applies to an escaped @, not a plain @.

How would I build an expression which erases the words which do not begin with "@"?

Upvotes: 0

Views: 118

Answers (3)

Peter Rincker
Peter Rincker

Reputation: 45177

Your question asks "How can I remove everything not matching some pattern?".

I want to answer "How do I capture all the matches (and delete the contents of buffer and paste the matches)?"

Why not "remove everything not matching some pattern"?

Regex's are good at matching patterns, however not matching is trickier. Sure sometimes you can use negative look-aheads and look-behinds, but not every case is so straight forward. Matching exactly what you want is far easier. However if you do want to do it here is as close as I can get without breaking my brain:

:%s/.\&\(@\w*\)\@<![^@]//g

Note: this leaves trailing spaces and blank lines

Overview

The idea is to capture each match via :s and in the replacement execute an expression that will build up the matches into a register. Then delete, :d, all the lines and paste the register with the matches back to the register.

The How

:let @a = ""
:%s/@\w\+/\=setreg('A', submatch(0), 'l')/n
:%d_
:%pu a
:1d_

Glory of details

  • Clear the a register via let @a = ""
  • Match the twitter users via @\w\+ pattern
  • Use \= inside the replacement of the :s to execute an expression
  • use setreg() to set the value of the register
  • using a capital register will append instead of replace
  • submatch(0) yields the matched content
  • using the 3rd parameter value of 'l' specifies to append matches line-wise
  • using the n flag will prevent the buffer from being altered (optional)
  • :%d_ delete entire buffer to the black hole register
  • :pu a will put the a register
  • :1d_ will remove the empty first line

Well that is great but it is so much to type...

It may be a bunch to type compared to :%!grep -E -o '@\w+', but it is a pure vim solution. We can shorten into a single line if that would be better

:let @a = "" | %s/@\w\+/\=setreg('A', submatch(0), 'l')/n | %d_ | %pu a | 1d_

Probably not if you have to do anything like this on a regular basis. Here is a quick n' dirty command to put in your ~/.vimrc file:

" Extractomatic
" Replace the current buffer with each match on seperate line
" Usage:
"     :Extractomatic/pattern/
command! -nargs=+ Extractomatic
      \ let s:var = @a |
      \ let  @a = "" |
      \ %s<args>\=setreg('A', submatch(0), 'l')/n |
      \ %d_ |
      \ %pu a |
      \ 1d_ |
      \ let @a = s:var

Now you can just do :Extractomatic/@\w\+/.

However there are more robust solutions to this like Ingo Karkat's Extract Matches plugin and the Yankitute plugin.

Conclusion

Personally whichever way you want to use to solve this problem is good. However knowing how to use :s with a sub-replace-expression is a great way to level up your vim-script-fu

More help

:h :s
:h sub-replace-expression
:h submatch(
:h setreg(
:h registers
:h :d
:h :pu
:h range

Upvotes: 0

anubhava
anubhava

Reputation: 786091

You can use this regex in vim:

@\@<!\<\w\+\>

This will match all words that are not preceded by @ character.

To match all non-space characters not preceded by @ character use:

@\@<!\<\S\+\>

\@<! is the syntax for using negative lookbehind in vim which is equivalent of (?<!@) otherwise.

Upvotes: 3

Conffusion
Conffusion

Reputation: 4485

Don't know why you want to do this in vim. I assume you have a unix/linux OS as you mention vim. Thanks to extract words from a file I found the following solution:

grep -o -E '@\w+' twitterlog.txt > usernames.txt

Upvotes: 0

Related Questions