Reputation: 1081
Hello I need to read text with almost 300 000 words and determine global frequency of each word from input dictionary and make one array.. I have file of sentences and dictionary file with words and their frequency... This is my code:
const sentenceFreq = [];
let text = [];
for (const sentence of srcSentences) {
// remove special characters
const sentenceWithoutSpecial = sentence.srcLangContent
.replace(/[`~!@#$%^&*„“()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, "");
text = text + sentenceWithoutSpecial + " ";
}
const words = text.replace(/[.]/g, "").split(/\s/);
words.map(async (w, i)=>{
const frequency = eng.filter((x) => x.word.toLowerCase() === w.toLowerCase());
if (frequency[0]) {
sentenceFreq.push({[frequency[0].freq]: w});
} else {
sentenceFreq.push({0: w});
}
});
This is english dictionary
let eng = [
{word:"the",freq:23135851162},
{word:"of",freq:13151942776},
{word:"and",freq:12997637966},
{word:"to",freq:12136980858},
{word:"a",freq:9081174698},
{word:"in",freq:8469404971}
....]
So if my text is " Today is beautiful day" code should search through each word find it in eng dictionary and return its frequency so result would be [{1334:"today"},{521:"is"},{678854:"beautiful"},{9754334:"day"}]
So this numbers 1334,521...
are frequencies found in eng dictionary.
The problem is this is too slow since I have 300 000 words... is any more efficient way to read array of words and to find it in array of file english words...
So if I have array ['today', 'is', 'good', 'day']
can I automatically search for all values in eng array instead of going through each word using loop?
Upvotes: 0
Views: 80
Reputation: 1404
Rather than using an array of objects like [ {word1: "text", frequency: 4} ]
for your lookup, try making one object, where the property names are the words and the count is their frequency. Then you can map your words array to the final output:
const myString = "Today is beautiful day. I like to walk and go in the forest.";
const cleanText = myString.replace(/[`~!@#$%^&*„“()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, "");
const eng = [
{word:"the",freq:23135851162},
{word:"of",freq:13151942776},
{word:"and",freq:12997637966},
{word:"to",freq:12136980858},
{word:"a",freq:9081174698},
{word:"in",freq:8469404971}
];
const myEng = eng.reduce((obj, {word, freq}) => { // reduce all the values in the "eng" array to a single object
obj[word] = freq; // assuming there are no duplicates, each word should have a new entry
return obj; // return the object for the next iteration to use
}, {} // these brackets here are the "obj" value on the first loop, in this case an empty object
);
console.log("New object with fast lookup:\n", myEng);
const wordsArr = cleanText.split(" ");
const out = wordsArr.map((word) => {
const freq = myEng[word] || 0; // freq = myEng[word] if it exists, else 0
return { [freq]: word }; // replace the word in the array with an object in format { frequency : word }
});
console.log("Output:\n", out);
.as-console-wrapper { min-height: 100% } /* Change the console output size. */
This will be massively faster, as the lookup times will be reduced for every word you want to check.
Upvotes: 1