Reputation: 2225
Lets say I've got a file with multiple lines like
A.B C B.DAT
E.F C F1.DAT
I'd like to identify those lines where I have duplicates (for example of B). But only if the duplicate is followed by .DAT (Note that each element A,B,C,... can be of any length)
So in the aforementioned exampled the first line should return a match and the second shouldn't.
I would like to proceed with removing the duplicate (which would be B.DAT), so how can I ensure to match the second occurrence per line only?
Upvotes: 0
Views: 653
Reputation: 40533
This regular expression should do what you want (if I understood you...)
/\(.\).*\zs\1\.DAT
This translates to
\( 2: and "keep" it for later with `\1`
. 1: get any character
\) 2:
.* 3: Match any number of characters ...
\zs 4: (and set the start of the matched region)
\1 5: ... followed by the kept character (step 2)
\.DAT 6: followed by .DAT
With this regular expression you can remove the B.DAT
with a
%s/\(.\).*\zs\1\.DAT//
Update It turns out that the duplicate can consist of multiple characters. In that case, the regular expression becomes \(\S\+\).*\zs\1\.DAT
. The \S\+
now matches any number > 1= of non-white space characters, the rest of the regular expression is the same.
Upvotes: 8