Reputation: 87
There are a lot of ways to remove duplicate lines, but I want to leave only the unique lines, and delete all duplicated lines.
From something like this:
Duplicate
Duplicate
Important text
Other duplicate
Important text1
Other duplicate
To get this:
Important text
Important text1
There are thousands of lines I need to remove, and the unique lines are just 10-20 mixed with all those duplicate lines.
Upvotes: 4
Views: 5366
Reputation: 1528
If you are using a unix system and the lines are in a file, then you can open the terminal and execute
$ sort -u file.txt > uniqelines.txt
If you actually want the duplicate lines to be removed you can run
$ sort file.txt | uniq -u
Upvotes: 2
Reputation: 91385
Have a try with:
Find what: ^(.+)\R([\s\S]*?)\1$
Repalce with: $2
Make sure you've checked Regular Expression
, Case sensitive
but NOT . matches newline
Upvotes: 3
Reputation: 5395
I think regex could help, you can first recognize repeated lines with something like this:
^(.+)$(?=[\s\S]*^(\1)$[\s\S]*)
then remove every occurance of matched fragment in text. However I think Notepad++ don't have such capabilities.
This regex will match only first occurance, and will capture in group the second one. But regex cannot match not-continuous text
Example in Java:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test{
public static void main(String[] args){
String test = "Duplicate\n" +
"Duplicate\n" +
"Important text\n" +
"Other duplicate\n" +
"Important text1\n" +
"Other duplicate";
String result = test;
Matcher matcher = Pattern.compile("^(.+)$(?=[\\s\\S]*^(\\1)$[\\s\\S]*)",Pattern.MULTILINE).matcher(test);
while(matcher.find()){
result = result.replaceAll(matcher.group(),"");
}
System.out.println(result);
}
}
with result:
Important text
Important text1
However if you use replaceAll()
in Notepad++ with this regex, it should leave only one occurance of given line.
Upvotes: 4