Reputation: 343

How to delete the rest of the records after a pattern which occurred for the second time in a .CSV file

I have a .CSV file which has few records after a header, however, there is a duplicate header just before the end of the file and after that duplicate header are few more records(which I do not need). Is there a way that I could check for the pattern of the header which occurred for the second time and delete the rest of the file after that duplicate header? Below is the example of the file.

col0,col1, col2, col3 , col4 , col5, col6 ,
1value0,1value1,1value2,1value3,1value4,1value5,1value6,
2value0, 2value1, 2value2, 2value3, 2value4, 2value5, 2value6,
3value, 3value1, 3value2, 3value3, 3value4, 3value5, 3value6,
2value0, 4value1, 4value2, 4value3, 4value4, 4value5, 4value6,
5value0, 5value1, 5value2, 5value3, 5value4, 5value5, 5value6,
6value0, 6value1, 6value2, 6value3, 6value4, 6value5, 6value6,
,,,,,,,
,,,,,,,
,,,,,,,
(n-1)value0, (n-1)value1, (n-1)value2, (n-1)value3, (n-1)value4, (n-1)value5, (n-1)value6,
(n)value0, (n)value1, (n)value2, (n)value3, (n)value4, (n)value5, (n)value6,
col0,col1, col2, col3 , col4 , col5, col6 ,
1,unwanted, records, after, the, duplicate, header
2,unwanted, records, after, the, duplicate, header
3,unwanted, records, after, the, duplicate, header

Here the output that I am expecting is shown below

col0,col1, col2, col3 , col4 , col5, col6 ,
1value0,1value1,1value2,1value3,1value4,1value5,1value6,
2value0, 2value1, 2value2, 2value3, 2value4, 2value5, 2value6,
3value, 3value1, 3value2, 3value3, 3value4, 3value5, 3value6,
2value0, 4value1, 4value2, 4value3, 4value4, 4value5, 4value6,
5value0, 5value1, 5value2, 5value3, 5value4, 5value5, 5value6,
6value0, 6value1, 6value2, 6value3, 6value4, 6value5, 6value6,
,,,,,,,
,,,,,,,
,,,,,,,
(n-1)value0, (n-1)value1, (n-1)value2, (n-1)value3, (n-1)value4, (n-1)value5, (n-1)value6,
(n)value0, (n)value1, (n)value2, (n)value3, (n)value4, (n)value5, (n)value6,

P.S: I have GNU sed version 4.1.5 and GNU Awk 3.1.5

Any help is highly appreciated.

Upvotes: 1

Answers (4)

jaypal singh

Reputation: 77175

Try this:

awk 'a~$0{exit}NR==1{a=$0}1' file

Upvotes: 2

fileunderwater

Reputation: 1125

Probably way more complicated than it needs to be:

awk 'BEGIN{flag=0} $0==head{flag=1}; NR==1{head=$0}; flag==0{print $0}' file

Upvotes: 2

potong

Reputation: 58558

This might work for you (GNU sed 4.2.1):

sed 's/,/\n/8;T;s/\n.*//;q' file

This works by trying to replace the 8th , by itself and if it fails to bail out and print the line as usual. Most lines (in your example) have only 7 comma's, so will be left alone, whereas the line containing the duplicate header will then be shortened and the printed out when the processing is quit.

Upvotes: 2

svante

Reputation: 1385

Try

awk 'd<2{print} /col1, col2, col3 , col4 , col5, col6/{d++}' file

Upvotes: 0

How to delete the rest of the records after a pattern which occurred for the second time in a .CSV file

Answers (4)

Related Questions