Reputation: 9664
I have a .list file containing information on movies. The file is formatted as follows
New Distribution Votes Rank Title
0000000125 1176527 9.2 The Shawshank Redemption (1994)
0000000125 817264 9.2 The Godfather (1972)
0000000124 538216 9.0 The Godfather: Part II (1974)
0000000124 1142277 8.9 The Dark Knight (2008)
0000000124 906356 8.9 Pulp Fiction (1994)
The code I have so far is as follows:
//modules ill be using
var fs = require('fs');
var csv = require('csv');
csv().from.path('files/info.txt', { delimiter: ' '})
.to.array(function(data){
console.log(data);
});
But because the values are separated by single spaces, double spaces and tabs. There is no single delimiter to use. How can I extract this information into an array?
Upvotes: 1
Views: 282
Reputation: 15550
You can shrink multiple spaces in to one space with and then you can read it as string like;
fs = require('fs')
fs.readFile('files/info.txt', 'utf8', function (err, csvdata) {
if (err) {
return console.log(err);
}
var movies = csvdata.replace(/\s+/g, "\t");
csv().from.string(moviews, { delimiter: '\t'})
.to.array(function(data){
console.log(data);
});
});
Upvotes: 3
Reputation: 14156
It looks easy to parse with regex:
function parse(row) {
var match = row.match(/\s{6}(\d*)\s{2}(\d*)\s{3}(\d*\.\d)/)
return {
distribution: match[1],
votes: match[2],
rank: match[3]
};
}
fs.readFileSync(file)
.split('\n')
.slice(1) //since we don't care about the first row
.map(parse);
I will live you to build the rest of the regex. I juse two tools to do so: rubular.com and node.js repl.
This \s{6}(\d*)\s{2}(\d*)
means: MATCH 6 SPACEs, then capture an arbitrary number of digits then match 2 spaces, then capture another arbitrary number of digits, etc.
Upvotes: 0