Reputation: 45

How do I remove duplicate lines using Sed without sorting?

I've been trying to figure out how to delete duplicate lines using only Sed and I'm having trouble figuring out how to do it.

So far I've tried this and it hasn't worked.

sed '$!N; /^\(.*\)\n\1$/!P; D'

file:

APPLE

ORANGES

BANANA

BANANA

COOKIES

FRUITS

What I got:

APPLE

ORANGES

BANANA

BANANA

COOKIES

FRUITS

What I want:

APPLE

ORANGES

BANANA

COOKIES

FRUITS

I've been trying to figure out how to do it so I won't have to manually go through each line in a file and tell it to manually delete the duplicates.

My goal is for this to eventually delete the second instance of BANANA.

Can anyone point me in the right direction?

Thanks

Upvotes: 1

Answers (4)

Martin Vahi

Reputation: 139

Assuming that the reason, why You wanted to use the sed was that it is fast and available on Linux as a standard tool, You may want to consider using another standard Linux command line tool called "uniq" and sometimes combine it with yet another standard Linux command line tool, "sort".

ts3b@terminal01:~/demo$ ls
repeated_lines.txt
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt 
AAA 
BBB
BBB
CCC
AAA 
AAA 
BBB
CCC
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt | uniq
AAA 
BBB
CCC
AAA 
BBB
CCC
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt | uniq | sort
AAA 
AAA 
BBB
BBB
CCC
CCC
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt | uniq | sort | uniq
AAA 
BBB
CCC
ts3b@terminal01:~/demo$

On Linux the "sed" is the "GNU sed", which behaves differently than the "sed" command on FreeBSD. The "GNU sed" may be available on FreeBSD as "gsed". In the case of some regular expressions the two "sed-s" may behave the same way, but if one wants to save time by testing the regular expressions only on one of them, for example, the "GNU sed", then here's a candidate Bash snippet that might become handy at making one's Bash script to work on both, FreeBSD and Linux:

S_CMD_GNU_SED="sed"
if [ "`uname -a | grep -i 'BSD' `" != '' ]; then 
    S_CMD_GNU_SED="gsed"
fi
#
# There's a similar case with GNU Make versus BSD Make:
#
S_CMD_GNU_MAKE="make"
if [ "`uname -a | grep -i 'BSD' `" != '' ]; then 
    S_CMD_GNU_MAKE="gmake"
fi

Thank You for reading my comment :-)

Upvotes: 0

potong

Reputation: 58578

This might work for you (GNU sed):

   sed -E '1s/^/\n/;:a;N;s/((\n\S+)(\n\S+)*)\n\2$/\1/;$!ba;s/.//' file

On the first line, insert a newline for regexp purposes.

Gather up the lines in the pattern space, removing duplicates when added (plus the empty line beforehand).

At end of the file, remove the introduced newline and print the result.

Upvotes: 1

sseLtaH

Reputation: 11247

Using sed

$ sed -n '/^$/d;G;/^\(.*\n\).*\n\1$/d;H;P;a\ ' input_file
APPLE

ORANGES

BANANA

COOKIES

FRUITS

Remove blank lines. Append hold space. If the line is duplicated, delete it, else copy into hold space, print and insert blank lines.

Upvotes: 4

clogwog

Reputation: 373

mmm that is odd, that seems to work for me. Is it because you have an empty line in between each text-line ?

~$ cat test.txt
APPLES
ORANAGES
BANANA
BANANA
COOKIES
FRUITS

~$ cat test.txt |  sed '$!N; /^\(.*\)\n\1$/!P; D'
APPLES
ORANAGES
BANANA
COOKIES
FRUITS

Upvotes: 2

How do I remove duplicate lines using Sed without sorting?

Answers (4)

Related Questions