Reputation: 43
I'm looking for a way to remove the first n
lines from csv
files.
Basically I've been given a dump of several hundred csv files with the task of creating a queryable MySQL database. The files have a legend in non-csv format taking up the first ~10
lines and throw an error when attempting to import to MySQL. The legend is variable in length as not all files have the same number of parameters.
I'm looking for a way to remove the legend and the only pattern I can find is that the first csv
element is always the second instance of the word year.
The files basically look something like this, I want the start of each file to be the second instance of lower-case year.
Legend:
non-csv text...
year: Year
... etc
(csv format) year, month, day, etc...
I've looked at sed
commands to loop through each file but can't find one that achieves exactly what I want. i.e:
find . -name "*.csv" |
while read filename;
do
sed -n '/year/,$p' $filename > newFile.csv;
done;
This removes all text before the first instance of year but I'm unfamiliar with sed
and can't figure out how to make it skip to the second instance. I tried the above in a recursive function but it didn't work.
Any suggestions?
Upvotes: 2
Views: 67
Reputation: 58558
This might work for you(GNU sed
):
sed ':a;N;s/year/&/2;Ta;s/.*\n//' file
This gathers up lines until the second appearance of year
and then deletes all lines up to but not including the current line.
Upvotes: 1
Reputation: 67557
awk
to the rescue!
$ awk '/year/{c++} c>1' file
(csv format) year, month, day, etc...
Upvotes: 3