Reputation: 97

Remove/Delete all duplicate lines

There are a lot of ways to remove duplicate lines, but I want to leave only the unique lines, and delete all duplicated lines.

From something like this:

Duplicate
Duplicate
Important text
Other duplicate
Important text1
Other duplicate

To get this:

Important text
Important text1

There are thousands of lines I need to remove, and the unique lines are just 10-20 mixed with all those duplicate lines.

Upvotes: 5

Answers (3)

benbo

Reputation: 1526

If you are using a unix system and the lines are in a file, then you can open the terminal and execute

$ sort -u file.txt > uniqelines.txt

If you actually want the duplicate lines to be removed you can run

$ sort file.txt | uniq -u

Upvotes: 2

Toto

Reputation: 91518

Have a try with:

Find what: ^(.+)\R([\s\S]*?)\1$
Repalce with: $2

Make sure you've checked Regular Expression, Case sensitive but NOT . matches newline

Upvotes: 4

m.cekiera

Reputation: 5385

I think regex could help, you can first recognize repeated lines with something like this:

^(.+)$(?=[\s\S]*^(\1)$[\s\S]*)

DEMO

then remove every occurance of matched fragment in text. However I think Notepad++ don't have such capabilities.

This regex will match only first occurance, and will capture in group the second one. But regex cannot match not-continuous text

Example in Java:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test{
    public static void main(String[] args){
        String test = "Duplicate\n" +
                "Duplicate\n" +
                "Important text\n" +
                "Other duplicate\n" +
                "Important text1\n" +
                "Other duplicate";
        String result = test;
        Matcher matcher = Pattern.compile("^(.+)$(?=[\\s\\S]*^(\\1)$[\\s\\S]*)",Pattern.MULTILINE).matcher(test);
        while(matcher.find()){
            result = result.replaceAll(matcher.group(),"");
        }
        System.out.println(result);
    }
}

with result:

Important text

Important text1

However if you use replaceAll() in Notepad++ with this regex, it should leave only one occurance of given line.

Upvotes: 4

Remove/Delete all duplicate lines

Answers (3)

Related Questions