Reputation: 2487
Brief:
A system that will load CSV files, but they are expected to be huge (+1M lines). I already have a idea on how to process them using queues and background jobs/tasks.
But,
I want to display to the user a progress on his file, something line: 2165 of 1246875 or maybe the percentage of it. To archive this, I need to know the number of lines in the file, but I have to to this without loading its contents to the memory, so it can be fast, as soon as I got an upload and can save the filename within the total lines founds in it.
In PHP this is possible using SplFileObject
trying to seek()
to the PHP_MAX_INT
, then it goes to the highest line it can in the file and key()
returns that line.
But the system is being built entirely in JavaScript/Node.js so, just for convenience, I want to build this system part in JavaScript as well.
How could I accomplish that? Already took a look at FS API, but didn't find how to to this.
[EDIT]
Ideas so far:
child_process.exec
+ wc -l
(Unix only)FileReader
(Delegate resources to the user)Upvotes: 1
Views: 2368
Reputation: 2487
This is impossible.
Lines are a human concept about a file. For computers, files are just a bunch of bytes; you can know the total bytes, you can seek thought bytes length, but knowing how much lines does this bytes have envolves counting line breaks and counting line breaks envolves reading them.
Both wc
and PHP's SplFileObject
streams the entire file, they don't do magic. So the best answer is which method does this in most efficient way. Which means, what GC would operate better.
On the other hand, if accuracy is not a requirement, you can try to guess. If all lines have a fixed bytes length, you can divide it by the total bytes of the file. Or, as pointed by Aikon, you can read just a few bytes (them break into lines) get the average length of them and divide by the total bytes of the file.
Although it is bringing file content to the memory, Joel Lord answer is the answer for a Node.js solution. You can also take a look at readline module.
Upvotes: 2
Reputation: 2173
You would use a stream as documented here
The following example should could the number of lines in a file, using the first argument as the file name.
ie: node countlines.js nameoffiletocountthelines.csv
var fs = require("fs");
var lines = 0;
//Using the first argument as the filename
var filename = process.argv[2];
var stream = fs.createReadStream(filename)
//When data is received, check all the character codes and
//if we find a carriage return, increment the line counter
stream.on("data", function(chunk) {
for(var i = 0; i < chunk.length; i++) {
if (chunk[i] == 10 || chunk[i] == 13) lines++;
}
});
//When the file processing is done, echo the number of lines
stream.on("end", function() {
console.log("Lines: " + lines);
});
Upvotes: 0