Dr.Elch
Dr.Elch

Reputation: 2225

Find duplicates in single line with vim

Lets say I've got a file with multiple lines like

A.B C B.DAT
E.F C F1.DAT

I'd like to identify those lines where I have duplicates (for example of B). But only if the duplicate is followed by .DAT (Note that each element A,B,C,... can be of any length)

So in the aforementioned exampled the first line should return a match and the second shouldn't.

I would like to proceed with removing the duplicate (which would be B.DAT), so how can I ensure to match the second occurrence per line only?

Upvotes: 0

Views: 653

Answers (1)

René Nyffenegger
René Nyffenegger

Reputation: 40533

This regular expression should do what you want (if I understood you...)

/\(.\).*\zs\1\.DAT

This translates to

\(         2: and "keep" it for later with `\1`
 .      1: get any character
\)         2:
.*            3: Match any number of characters ...
\zs              4: (and set the start of the matched region)
\1                  5: ... followed by the kept character (step 2)
\.DAT                  6: followed by .DAT

With this regular expression you can remove the B.DAT with a

%s/\(.\).*\zs\1\.DAT//

Update It turns out that the duplicate can consist of multiple characters. In that case, the regular expression becomes \(\S\+\).*\zs\1\.DAT. The \S\+ now matches any number > 1= of non-white space characters, the rest of the regular expression is the same.

Upvotes: 8

Related Questions