remove duplicate information in .dat file (awk, sed)

Question

I have several large files which have been accidentally appended to with the correct information (i.e., the most recent data w/ header is correct, whereas the top information is incorrect), so there is a duplicate of the information:

H1 H2 H3 DATA SHIFT PROD VAL
12 12 13 8189 2 392 10
12 13 12 8199 3 281 11
...
...
H1 H2 H3 DATA SHIFT PROD VAL
12 12 13 8189 2 392 10
12 13 12 8199 3 281 15
...
...

If I only want to delete the header and information pertaining to the first header, how would I go about doing this? I can only get sed to work with matching 1 or 2 characters, and that deletes everything after, not before.

The expected output should simply be:

H1 H2 H3 DATA SHIFT PROD VAL
12 12 13 8189 2 392 10
12 13 12 8199 3 281 15
...
...

karakfa · Accepted Answer

awk to the rescue!

$ awk 'NR==1{h=$0; next} $0==h{p++} p' file

H1 H2 H3 DATA SHIFT PROD VAL
12 12 13 8189 2 392 10
12 13 12 8199 3 281 15
...
...

record header, start printing after seeing the header again.

remove duplicate information in .dat file (awk, sed)

Answers (2)

Related Questions