Reputation: 343
I have a .CSV file which has few records after a header, however, there is a duplicate header just before the end of the file and after that duplicate header are few more records(which I do not need). Is there a way that I could check for the pattern of the header which occurred for the second time and delete the rest of the file after that duplicate header? Below is the example of the file.
col0,col1, col2, col3 , col4 , col5, col6 ,
1value0,1value1,1value2,1value3,1value4,1value5,1value6,
2value0, 2value1, 2value2, 2value3, 2value4, 2value5, 2value6,
3value, 3value1, 3value2, 3value3, 3value4, 3value5, 3value6,
2value0, 4value1, 4value2, 4value3, 4value4, 4value5, 4value6,
5value0, 5value1, 5value2, 5value3, 5value4, 5value5, 5value6,
6value0, 6value1, 6value2, 6value3, 6value4, 6value5, 6value6,
,,,,,,,
,,,,,,,
,,,,,,,
(n-1)value0, (n-1)value1, (n-1)value2, (n-1)value3, (n-1)value4, (n-1)value5, (n-1)value6,
(n)value0, (n)value1, (n)value2, (n)value3, (n)value4, (n)value5, (n)value6,
col0,col1, col2, col3 , col4 , col5, col6 ,
1,unwanted, records, after, the, duplicate, header
2,unwanted, records, after, the, duplicate, header
3,unwanted, records, after, the, duplicate, header
Here the output that I am expecting is shown below
col0,col1, col2, col3 , col4 , col5, col6 ,
1value0,1value1,1value2,1value3,1value4,1value5,1value6,
2value0, 2value1, 2value2, 2value3, 2value4, 2value5, 2value6,
3value, 3value1, 3value2, 3value3, 3value4, 3value5, 3value6,
2value0, 4value1, 4value2, 4value3, 4value4, 4value5, 4value6,
5value0, 5value1, 5value2, 5value3, 5value4, 5value5, 5value6,
6value0, 6value1, 6value2, 6value3, 6value4, 6value5, 6value6,
,,,,,,,
,,,,,,,
,,,,,,,
(n-1)value0, (n-1)value1, (n-1)value2, (n-1)value3, (n-1)value4, (n-1)value5, (n-1)value6,
(n)value0, (n)value1, (n)value2, (n)value3, (n)value4, (n)value5, (n)value6,
P.S: I have GNU sed version 4.1.5 and GNU Awk 3.1.5
Any help is highly appreciated.
Upvotes: 1
Views: 169
Reputation: 1125
Probably way more complicated than it needs to be:
awk 'BEGIN{flag=0} $0==head{flag=1}; NR==1{head=$0}; flag==0{print $0}' file
Upvotes: 2
Reputation: 58558
This might work for you (GNU sed 4.2.1):
sed 's/,/\n/8;T;s/\n.*//;q' file
This works by trying to replace the 8th ,
by itself and if it fails to bail out and print the line as usual. Most lines (in your example) have only 7 comma's, so will be left alone, whereas the line containing the duplicate header will then be shortened and the printed out when the processing is quit.
Upvotes: 2
Reputation: 1385
Try
awk 'd<2{print} /col1, col2, col3 , col4 , col5, col6/{d++}' file
Upvotes: 0