Reputation: 65
I have a file called "1.txt" which contains the following:
111
111
222
777
1111
777
I'm trying to delete duplicate strings from it. Both sort -u 1.txt
and sort 1.txt | uniq
return this:
111
1111
222
777
777
Question:
Why is the string "777" still contained twice? How to remove the duplicate?
Upvotes: 0
Views: 84
Reputation: 171
Try to use sed to delete non-digit char at the end of line, then use sort and uniq to delete duplicate string.
sed 's/[^0-9]\{0,\}$//' 1.txt | sort | uniq
where s : to replace matched string
[^0-9] : to match non-digit char
\{0,\} : zero or more pattern match
$ : matches the end of lines
Upvotes: 0
Reputation: 356
Probably, one of the "777" has a hidden character at the end. Try checking the length of each line of your file with:
$ awk '{ print length($0); }' 1.txt
Compare the length of both "777" lines, they should be different in your file.
Upvotes: 2