Reputation: 2708
Currently I'm using node-csv (http://www.adaltas.com/projects/node-csv/) for csv file parsing.
Is there a way to skip first few lines of the file before starting to parse the data? As some csv reports for example have report details in the first few lines before the actual headers and data start.
LOG REPORT <- data about the report
DATE: 1.1.1900
DATE,EVENT,MESSAGE <- data headers
1.1.1900,LOG,Hello World! <- actual data stars here
Upvotes: 11
Views: 24439
Reputation: 1231
All you need to do to pass argument {from_line: 2}
inside parse()
function.
like the snippet below
const fs = require('fs');
const parse = require('csv-parse');
fs.createReadStream('path/to/file')
.pipe(parse({ delimiter: ',', from_line: 2 }))
.on('data', (row) => {
// it will start from 2nd row
console.log(row)
})
Upvotes: 12
Reputation: 9582
Assuming you're using v0.4 or greater with the new refactor (i.e. csv-generate, csv-parse, stream-transform, and csv-stringify), you can use the built-in transform to skip the first line, with a bit of extra work.
var fs = require('fs'),
csv = require('csv');
var skipHeader = true; // config option
var read = fs.createReadStream('in.csv'),
write = fs.createWriteStream('out.jsonish'),
parse = csv.parse(),
rowCount = 0, // to keep track of where we are
transform = csv.transform(function(row,cb) {
var result;
if ( skipHeader && rowCount === 0 ) { // if the option is turned on and this is the first line
result = null; // pass null to cb to skip
} else {
result = JSON.stringify(row)+'\n'; // otherwise apply the transform however you want
}
rowCount++; // next time we're not at the first line anymore
cb(null,result); // let node-csv know we're done transforming
});
read
.pipe(parse)
.pipe(transform)
.pipe(write).once('finish',function() {
// done
});
Essentially we track the number of rows that have been transformed and if we're on the very first one (and we in-fact wish to skip the header via skipHeader
bool), then pass null
to the callback as the second param (first one is always error), otherwise pass the transformed result.
This will also work with synchronous parsing, but requires a change since there are no callback in synchronous mode. Also, the same logic could be applied to the older v0.2 library since it also has row transforming built-in.
See http://csv.adaltas.com/transform/#skipping-and-creating-records
This is pretty easy to apply, and IMO has a pretty low footprint. Usually you want to keep track of rows processed for status purposes, and I almost always transform the result set before sending it to Writable, so it is very simple to just add in the extra logic to check for skipping the header. The added benefit here is that we're using the same module to apply skipping logic as we are to parse/transform - no extra dependencies are needed.
Upvotes: 7
Reputation: 47993
You have two options here:
You can process the file line-by-line. I posted a code snippet in an answer earlier. You can use that
var rl = readline.createInterface({
input: instream,
output: outstream,
terminal: false
});
rl.on('line', function(line) {
console.log(line);
//Do your stuff ...
//Then write to outstream
rl.write(line);
});
You can give an offset to your filestream which will skip those bytes. You can see it in the documentation
fs.createReadStream('sample.txt', {start: 90, end: 99});
This is much easier if you know the offset is fixed.
Upvotes: 6