Reputation: 45
I've been trying to figure out how to delete duplicate lines using only Sed and I'm having trouble figuring out how to do it.
So far I've tried this and it hasn't worked.
sed '$!N; /^\(.*\)\n\1$/!P; D'
file:
APPLE
ORANGES
BANANA
BANANA
COOKIES
FRUITS
What I got:
APPLE
ORANGES
BANANA
BANANA
COOKIES
FRUITS
What I want:
APPLE
ORANGES
BANANA
COOKIES
FRUITS
I've been trying to figure out how to do it so I won't have to manually go through each line in a file and tell it to manually delete the duplicates.
My goal is for this to eventually delete the second instance of BANANA.
Can anyone point me in the right direction?
Thanks
Upvotes: 1
Views: 2463
Reputation: 139
Assuming that the reason, why You wanted to use the sed was that it is fast and available on Linux as a standard tool, You may want to consider using another standard Linux command line tool called "uniq" and sometimes combine it with yet another standard Linux command line tool, "sort".
ts3b@terminal01:~/demo$ ls
repeated_lines.txt
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt
AAA
BBB
BBB
CCC
AAA
AAA
BBB
CCC
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt | uniq
AAA
BBB
CCC
AAA
BBB
CCC
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt | uniq | sort
AAA
AAA
BBB
BBB
CCC
CCC
ts3b@terminal01:~/demo$ cat ./repeated_lines.txt | uniq | sort | uniq
AAA
BBB
CCC
ts3b@terminal01:~/demo$
On Linux the "sed" is the "GNU sed", which behaves differently than the "sed" command on FreeBSD. The "GNU sed" may be available on FreeBSD as "gsed". In the case of some regular expressions the two "sed-s" may behave the same way, but if one wants to save time by testing the regular expressions only on one of them, for example, the "GNU sed", then here's a candidate Bash snippet that might become handy at making one's Bash script to work on both, FreeBSD and Linux:
S_CMD_GNU_SED="sed"
if [ "`uname -a | grep -i 'BSD' `" != '' ]; then
S_CMD_GNU_SED="gsed"
fi
#
# There's a similar case with GNU Make versus BSD Make:
#
S_CMD_GNU_MAKE="make"
if [ "`uname -a | grep -i 'BSD' `" != '' ]; then
S_CMD_GNU_MAKE="gmake"
fi
Thank You for reading my comment :-)
Upvotes: 0
Reputation: 58381
This might work for you (GNU sed):
sed -E '1s/^/\n/;:a;N;s/((\n\S+)(\n\S+)*)\n\2$/\1/;$!ba;s/.//' file
On the first line, insert a newline for regexp purposes.
Gather up the lines in the pattern space, removing duplicates when added (plus the empty line beforehand).
At end of the file, remove the introduced newline and print the result.
Upvotes: 1
Reputation: 11207
Using sed
$ sed -n '/^$/d;G;/^\(.*\n\).*\n\1$/d;H;P;a\ ' input_file
APPLE
ORANGES
BANANA
COOKIES
FRUITS
Remove blank lines. Append hold space. If the line is duplicated, delete it, else copy into hold space, print and insert blank lines.
Upvotes: 4
Reputation: 373
mmm that is odd, that seems to work for me. Is it because you have an empty line in between each text-line ?
~$ cat test.txt
APPLES
ORANAGES
BANANA
BANANA
COOKIES
FRUITS
~$ cat test.txt | sed '$!N; /^\(.*\)\n\1$/!P; D'
APPLES
ORANAGES
BANANA
COOKIES
FRUITS
Upvotes: 2