URS
URS

Reputation: 25

Frequent file reads vs accessing huge Array

I have a list of 40000 words from which I want to frequently return 20-40 words at random using Javascript(node.js) at request from client-side. Would it be better to read it every time from a file or to store it once in an array and then access it?

Upvotes: 1

Views: 74

Answers (4)

FDavidov
FDavidov

Reputation: 3675

Your question does not specify how it would select words randomly if picked from a file. I'll give an answer based on hunch.

I/O vis-à-vis the computer local disk will always be slower than accessing data on memory. If your data (words) have a flat arrangement (i.e. simple array with 40000 entries), you can load it into a memory array and access words by randomizing the index you pick from the array.

Upvotes: 4

Abdennour TOUMI
Abdennour TOUMI

Reputation: 93193

Huge Array (∈ RAM):

If you have a good RAM, loading your file lines in Huge array is better . Just, don't forget to increase the use of RAM when running your Node.js app .

node --max_old_space_size=2000 index.js //#default is 512Mo

Then , in load file's lines into array:

var JFile=require('jfile');
var words=new JFile('words.txt');
//--> words.lines // return an array of lines, then , you can handle it
 var between20_40= Math.floor(Math.random() * (40 - 20 + 1)) + 20;
 var randome_words=words.lines.sort().slice(0,between20_40) // will get n lines randomly (n between 20 & 40)

Access file (∈ HDD ):

IF your RAM capacity is small & you worry about it, accessing file is better :

var spawn = require('child_process').spawn;
 var between20_40= Math.floor(Math.random() * (40 - 20 + 1)) + 20;
var shellSyntaxCommand = `sort -R words.txt | head -n ${between20_40}`;
var output=spawn('sh', ['-c', shellSyntaxCommand], { stdio: 'inherit' });

Conclusion :

Dealing with something in RAM is much better than something in HDD . Thus, if you have a good RAM capacity , we recommend the first option.

Upvotes: 1

Frank Roth
Frank Roth

Reputation: 6339

Had exactly the same use case within a blacklist algorithm. I figured out that loading 40.000 words into a JS object is absolutely no problem. So rereading it every time you want to get those values is very unnecessary.

So your solution should be: load once, read multiple times with the help of the array index (do not iterate the array).

Upvotes: 1

bulmad
bulmad

Reputation: 1

It is much faster and easier to access the words from the array as you would just randomize the indexes and extract at that index while with reading random words from the file more tedious methods will have to be applied. But the reading from the file is dependent on how the words appear in the file.

Upvotes: 0

Related Questions