Miss_Orchid
Miss_Orchid

Reputation: 374

remove duplicate information in .dat file (awk, sed)

I have several large files which have been accidentally appended to with the correct information (i.e., the most recent data w/ header is correct, whereas the top information is incorrect), so there is a duplicate of the information:

H1 H2 H3 DATA SHIFT PROD VAL
12 12 13 8189 2 392 10
12 13 12 8199 3 281 11
...
...
H1 H2 H3 DATA SHIFT PROD VAL
12 12 13 8189 2 392 10
12 13 12 8199 3 281 15
...
...

If I only want to delete the header and information pertaining to the first header, how would I go about doing this? I can only get sed to work with matching 1 or 2 characters, and that deletes everything after, not before.

The expected output should simply be:

H1 H2 H3 DATA SHIFT PROD VAL
12 12 13 8189 2 392 10
12 13 12 8199 3 281 15
...
...

Upvotes: 0

Views: 129

Answers (2)

stack0114106
stack0114106

Reputation: 8711

Try this Perl solution

$ perl -ne ' $x=$_ if $.==1; $y++ if $.>1 and $x eq $_; print if $y ' simpson.txt
H1 H2 H3 DATA SHIFT PROD VAL
12 12 13 8189 2 392 10
12 13 12 8199 3 281 15
...
...

$

Upvotes: 0

karakfa
karakfa

Reputation: 67497

awk to the rescue!

$ awk 'NR==1{h=$0; next} $0==h{p++} p' file

H1 H2 H3 DATA SHIFT PROD VAL
12 12 13 8189 2 392 10
12 13 12 8199 3 281 15
...
...

record header, start printing after seeing the header again.

Upvotes: 1

Related Questions