Mikhail
Mikhail

Reputation: 65

Can't delete duplicate strings with shell commands

I have a file called "1.txt" which contains the following:

111
111
222
777
1111
777

I'm trying to delete duplicate strings from it. Both sort -u 1.txt and sort 1.txt | uniq return this:

111
1111
222
777
777

Question:

Why is the string "777" still contained twice? How to remove the duplicate?

Upvotes: 0

Views: 84

Answers (2)

Lester_wu
Lester_wu

Reputation: 171

Try to use sed to delete non-digit char at the end of line, then use sort and uniq to delete duplicate string.

sed  's/[^0-9]\{0,\}$//' 1.txt | sort | uniq

where s : to replace matched string 
      [^0-9] : to match non-digit char
      \{0,\} : zero or more pattern match
      $ : matches the end of lines

Upvotes: 0

alb3rtobr
alb3rtobr

Reputation: 356

Probably, one of the "777" has a hidden character at the end. Try checking the length of each line of your file with:

$ awk '{ print length($0); }' 1.txt

Compare the length of both "777" lines, they should be different in your file.

Upvotes: 2

Related Questions