wazzaday
wazzaday

Reputation: 9664

How to extract data from a .list file with node.js

I have a .list file containing information on movies. The file is formatted as follows

New  Distribution  Votes  Rank  Title
      0000000125  1176527   9.2  The Shawshank Redemption (1994)
      0000000125  817264   9.2  The Godfather (1972)
      0000000124  538216   9.0  The Godfather: Part II (1974)
      0000000124  1142277   8.9  The Dark Knight (2008)
      0000000124  906356   8.9  Pulp Fiction (1994)

The code I have so far is as follows:

//modules ill be using
var fs = require('fs');
var csv = require('csv');

csv().from.path('files/info.txt', { delimiter: '  '})
.to.array(function(data){
    console.log(data);
});

But because the values are separated by single spaces, double spaces and tabs. There is no single delimiter to use. How can I extract this information into an array?

Upvotes: 1

Views: 282

Answers (2)

Hüseyin BABAL
Hüseyin BABAL

Reputation: 15550

You can shrink multiple spaces in to one space with and then you can read it as string like;

fs = require('fs')
fs.readFile('files/info.txt', 'utf8', function (err, csvdata) {
  if (err) {
    return console.log(err);
  }
  var movies = csvdata.replace(/\s+/g, "\t");

  csv().from.string(moviews, { delimiter: '\t'})
    .to.array(function(data){
        console.log(data);
    });

});

Upvotes: 3

José F. Romaniello
José F. Romaniello

Reputation: 14156

It looks easy to parse with regex:

function parse(row) {
  var match = row.match(/\s{6}(\d*)\s{2}(\d*)\s{3}(\d*\.\d)/)
  return {
    distribution: match[1],
    votes: match[2],
    rank: match[3]
  };
}

fs.readFileSync(file)
  .split('\n')
  .slice(1) //since we don't care about the first row
  .map(parse);

I will live you to build the rest of the regex. I juse two tools to do so: rubular.com and node.js repl.

This \s{6}(\d*)\s{2}(\d*) means: MATCH 6 SPACEs, then capture an arbitrary number of digits then match 2 spaces, then capture another arbitrary number of digits, etc.

Upvotes: 0

Related Questions