Reputation: 368
I would like to extract the usernames from a long text file built from Twitter posts. I have tried with expressions such as
:%s#\([^@].\{-}\) ##g
:%s#\(\<[^@].\{-}\>\) ##g
but it doesn't work. I read Vim's documentation for @
, but, as far as I know, it applies to an escaped @
, not a plain @
.
How would I build an expression which erases the words which do not begin with "@"?
Upvotes: 0
Views: 118
Reputation: 45177
Your question asks "How can I remove everything not matching some pattern?".
I want to answer "How do I capture all the matches (and delete the contents of buffer and paste the matches)?"
Regex's are good at matching patterns, however not matching is trickier. Sure sometimes you can use negative look-aheads and look-behinds, but not every case is so straight forward. Matching exactly what you want is far easier. However if you do want to do it here is as close as I can get without breaking my brain:
:%s/.\&\(@\w*\)\@<![^@]//g
Note: this leaves trailing spaces and blank lines
The idea is to capture each match via :s
and in the replacement execute an expression that will build up the matches into a register. Then delete, :d
, all the lines and paste the register with the matches back to the register.
:let @a = ""
:%s/@\w\+/\=setreg('A', submatch(0), 'l')/n
:%d_
:%pu a
:1d_
a
register via let @a = ""
@\w\+
pattern\=
inside the replacement of the :s
to execute an expressionsetreg()
to set the value of the registersubmatch(0)
yields the matched content'l'
specifies to append matches line-wisen
flag will prevent the buffer from being altered (optional):%d_
delete entire buffer to the black hole register:pu a
will put the a register:1d_
will remove the empty first lineIt may be a bunch to type compared to :%!grep -E -o '@\w+'
, but it is a pure vim solution. We can shorten into a single line if that would be better
:let @a = "" | %s/@\w\+/\=setreg('A', submatch(0), 'l')/n | %d_ | %pu a | 1d_
Probably not if you have to do anything like this on a regular basis. Here is a quick n' dirty command to put in your ~/.vimrc
file:
" Extractomatic
" Replace the current buffer with each match on seperate line
" Usage:
" :Extractomatic/pattern/
command! -nargs=+ Extractomatic
\ let s:var = @a |
\ let @a = "" |
\ %s<args>\=setreg('A', submatch(0), 'l')/n |
\ %d_ |
\ %pu a |
\ 1d_ |
\ let @a = s:var
Now you can just do :Extractomatic/@\w\+/
.
However there are more robust solutions to this like Ingo Karkat's Extract Matches plugin and the Yankitute plugin.
Personally whichever way you want to use to solve this problem is good. However knowing how to use :s
with a sub-replace-expression is a great way to level up your vim-script-fu
:h :s
:h sub-replace-expression
:h submatch(
:h setreg(
:h registers
:h :d
:h :pu
:h range
Upvotes: 0
Reputation: 786091
You can use this regex in vim:
@\@<!\<\w\+\>
This will match all words that are not preceded by @
character.
To match all non-space characters not preceded by @
character use:
@\@<!\<\S\+\>
\@<!
is the syntax for using negative lookbehind in vim
which is equivalent of (?<!@)
otherwise.
Upvotes: 3
Reputation: 4485
Don't know why you want to do this in vim. I assume you have a unix/linux OS as you mention vim. Thanks to extract words from a file I found the following solution:
grep -o -E '@\w+' twitterlog.txt > usernames.txt
Upvotes: 0