CODEWITHSUNDEEP

Reputation: 147

How to remove duplicate headers from a file except first occurrence in linux

I have a file like below.

file1:

No name city country
1  xyz yyyy zzz
No name city country
2 test dddd xxxx
No name city country
3  xyz yyyy zzz

I want to delete the duplicate lines from this file except first occurrence and save the results in the same file.

I have tried below code, but did not help.

header=$(head -n 1 file1)
(printf "%s\n" "$header";
 grep -vFxe "$header" file1
) > file1

Upvotes: 2

Views: 1508

Answers (1)

Inian

Reputation: 85760

Quite simple in Awk, just include all the fields in the row as unique key,

awk '!unique[$1$2$3$4]++' file > new-file

which produces an output as

No name city country
1  xyz yyyy zzz
2 test dddd xxxx
3  xyz yyyy zzz

A more readable version in Awk consisting of a loop upto the max fields in the row (loop upto NF) would be to do

awk '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file > new-file

(or) a much readable version from Sundeep's comment below using $0 meaning the whole line contents

awk '!unique[$0]++' file

Follow-up question from OP to save the file in-place,

Latest versions of GNU Awk (since 4.1.0 released), have the option of "inplace" file editing:

[...] The "inplace" extension, built using the new facility, can be used to simulate the GNU "sed -i" feature. [...]

Example usage:

gawk -i inplace '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file

To keep the backup:

gawk -i inplace -v INPLACE_SUFFIX=.bak '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file

(or) if your Awk does not support that, use shell built-ins

tmp=$(mktemp) 
awk '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file > "$tmp" && mv "$tmp" file

Upvotes: 4

Related Questions