Reputation: 147
I have a file like below.
file1:
No name city country
1 xyz yyyy zzz
No name city country
2 test dddd xxxx
No name city country
3 xyz yyyy zzz
I want to delete the duplicate lines from this file except first occurrence and save the results in the same file.
I have tried below code, but did not help.
header=$(head -n 1 file1)
(printf "%s\n" "$header";
grep -vFxe "$header" file1
) > file1
Upvotes: 2
Views: 1508
Reputation: 85760
Quite simple in Awk
, just include all the fields in the row as unique key,
awk '!unique[$1$2$3$4]++' file > new-file
which produces an output as
No name city country
1 xyz yyyy zzz
2 test dddd xxxx
3 xyz yyyy zzz
A more readable version in Awk
consisting of a loop upto the max fields in the row (loop upto NF
) would be to do
awk '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file > new-file
(or) a much readable version from Sundeep's comment below using $0
meaning the whole line contents
awk '!unique[$0]++' file
Follow-up question from OP to save the file in-place,
Latest versions of GNU Awk (since 4.1.0 released), have the option of "inplace" file editing:
[...] The "inplace" extension, built using the new facility, can be used to simulate the GNU "
sed -i
" feature. [...]
Example usage:
gawk -i inplace '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file
To keep the backup:
gawk -i inplace -v INPLACE_SUFFIX=.bak '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file
(or) if your Awk
does not support that, use shell built-ins
tmp=$(mktemp)
awk '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file > "$tmp" && mv "$tmp" file
Upvotes: 4