Reputation: 53
I have long list of lines with a lot of situations like this, lines that have identical second word (second string after space), but the rest is different. I need to keep only one line with unique second string. Should work only for lines with the same second word which are always consecutive. For example, I have lines:
lineA 12345
lineB 12345
lineC 12345
lineD 788878
lineE 110881
lineF 110881
lineG 110881
lineH 287778
lineJ 251287
lineK 242424
lineL 242424
lineM 242424
to this result
lineA 12345 lineD 788878 lineE 110881 lineH 287778 lineJ 251287 lineK 242424
So, if second word in line is the match, delete all but one line. I tried to create regex but it deletes only if first word match, I can't figure out how to do it for second word after space like in the example.
^(\S++).*\K(?:\R\1(?:\h.*|$))+
Upvotes: 3
Views: 278
Reputation: 48711
You don't need all that dot-stars. They will slow things down. A proper and shorter version of yours would be:
^\S+\K( \S++)([^ ]+\1)+
and replace all matches with $1
See live demo here
Upvotes: 3
Reputation: 91385
^\S+\h+(\S+)\R\K(?:\S+\h+\1(?:\R|\Z))+
LEAVE EMPTY
Screen capture (before):
Screen capture (after):
Upvotes: 2
Reputation: 8076
This can be done by capturing 2 groups, the first being the original line you want to keep (\S+ (\d+))
, and the 2nd nested group which has the repeating digits (in your case the (\d+)
).
We then find all subsequent lines (greedy) that match the pattern when the digits are repeated \2
within (?:\R\S+ \2)+
, and replace all of those with the first line $1
.
Find Regex Without Newlines:
(\S+ (\d+))(?:\R\S+ \2)+
Replace All With:$1
Edit: Thanks Aaron for the newline trick! Learn something new after 16 years in npp!
Upvotes: 2