Remove text up to Nth instance of pattern match in csv files

Question

I'm looking for a way to remove the first n lines from csv files.

Basically I've been given a dump of several hundred csv files with the task of creating a queryable MySQL database. The files have a legend in non-csv format taking up the first ~10 lines and throw an error when attempting to import to MySQL. The legend is variable in length as not all files have the same number of parameters.

I'm looking for a way to remove the legend and the only pattern I can find is that the first csv element is always the second instance of the word year.

The files basically look something like this, I want the start of each file to be the second instance of lower-case year.

Legend:
non-csv text...
year: Year
... etc

(csv format) year, month, day, etc...

I've looked at sed commands to loop through each file but can't find one that achieves exactly what I want. i.e:

find . -name "*.csv" | 
while read filename; 
do 
  sed -n '/year/,$p' $filename > newFile.csv;
done;

This removes all text before the first instance of year but I'm unfamiliar with sed and can't figure out how to make it skip to the second instance. I tried the above in a recursive function but it didn't work.

Any suggestions?

karakfa · Accepted Answer

awk to the rescue!

$ awk '/year/{c++} c>1' file

(csv format) year, month, day, etc...

Remove text up to Nth instance of pattern match in csv files

Answers (2)

Related Questions