Anil
Anil

Reputation: 123

word frequency in javascript

enter image description here

How can I implement javascript function to calculate frequency of each word in a given sentence.

this is my code:

function search () {
  var data = document.getElementById('txt').value;
  var temp = data;
  var words = new Array();
  words = temp.split(" ");
  var uniqueWords = new Array();
  var count = new Array();


  for (var i = 0; i < words.length; i++) {
    //var count=0;
    var f = 0;
    for (j = 0; j < uniqueWords.length; j++) {
      if (words[i] == uniqueWords[j]) {
        count[j] = count[j] + 1;
        //uniqueWords[j]=words[i];
        f = 1;
      }
    }
    if (f == 0) {
      count[i] = 1;
      uniqueWords[i] = words[i];
    }
    console.log("count of " + uniqueWords[i] + " - " + count[i]);
  }
}

am unable to trace out the problem ..any help is greatly appriciated. output in this format: count of is - 1 count of the - 2..

input: this is anil is kum the anil

Upvotes: 8

Views: 32247

Answers (7)

Huzair Bahadur
Huzair Bahadur

Reputation: 11

I had a similar assignment. This is what I did:

Assignment : Clean the following text and find the most frequent word (hint, use replace and regular expressions).

const sentence = '%I $am@% a %tea@cher%, &and& I lo%#ve %te@a@ching%;. The@re $is no@th@ing; &as& mo@re rewarding as educa@ting &and& @emp%o@weri@ng peo@ple. ;I found tea@ching m%o@re interesting tha@n any ot#her %jo@bs. %Do@es thi%s mo@tiv#ate yo@u to be a tea@cher!? %Th#is 30#Days&OfJavaScript &is al@so $the $resu@lt of &love& of tea&ching'

console.log(`\n\n 03.Clean the following text and find the most frequent word (hint, use replace and regular expressions) \n\n ${sentence} \n\n`)

console.log(`Cleared sentence : ${sentence.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()@]/g, "")}`)

console.log(mostFrequentWord(sentence))


function mostFrequentWord(sentence) {
  sentence = sentence.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()@]/g, "").trim().toLowerCase()
  let sentenceArray = sentence.split(" ")
  let word = null
  let count = 0
  for (i = 0; i < sentenceArray.length; i++) {
    word = sentenceArray[i]
    count = sentence.match(RegExp(sentenceArray[i], 'gi')).length
    if (count > count) {
      count = count
      word = word
    }
  }
  return `\n Count of most frequent word "${word}" is ${count}`
}

Upvotes: 0

Asking
Asking

Reputation: 4192

const sentence = 'Hi my friend how are you my friend';

const countWords = (sentence) => {
    const convertToObject = sentence.split(" ").map( (i, k) => {
        return {
          element: {
              word: i,
              nr: sentence.split(" ").filter(j => j === i).length + ' occurrence',
          }

      }
  });
    return Array.from(new Set(convertToObject.map(JSON.stringify))).map(JSON.parse)
};

console.log(countWords(sentence));

Upvotes: 3

thdoan
thdoan

Reputation: 19107

I'd go with Sampson's match-reduce method for slightly better efficiency. Here's a modified version of it that is more production-ready. It's not perfect, but it should cover the vast majority of scenarios (i.e., "good enough").

function calcWordFreq(s) {
  // Normalize
  s = s.toLowerCase();
  // Strip quotes and brackets
  s = s.replace(/["“”(\[{}\])]|\B['‘]([^'’]+)['’]/g, '$1');
  // Strip dashes and ellipses
  s = s.replace(/[‒–—―…]|--|\.\.\./g, ' ');
  // Strip punctuation marks
  s = s.replace(/[!?;:.,]\B/g, '');
  return s.match(/\S+/g).reduce(function(oFreq, sWord) {
    if (oFreq.hasOwnProperty(sWord)) ++oFreq[sWord];
    else oFreq[sWord] = 1;
    return oFreq;
  }, {});
}

calcWordFreq('A ‘bad’, “BAD” wolf-man...a good ol\' spook -- I\'m frightened!') returns

{
  "a": 2
  "bad": 2
  "frightened": 1
  "good": 1
  "i'm": 1
  "ol'": 1
  "spook": 1
  "wolf-man": 1
}

Upvotes: -1

Cymen
Cymen

Reputation: 14419

Here is a JavaScript function to get the frequency of each word in a sentence:

function wordFreq(string) {
    var words = string.replace(/[.]/g, '').split(/\s/);
    var freqMap = {};
    words.forEach(function(w) {
        if (!freqMap[w]) {
            freqMap[w] = 0;
        }
        freqMap[w] += 1;
    });

    return freqMap;
}

It will return a hash of word to word count. So for example, if we run it like so:

console.log(wordFreq("I am the big the big bull."));
> Object {I: 1, am: 1, the: 2, big: 2, bull: 1}

You can iterate over the words with Object.keys(result).sort().forEach(result) {...}. So we could hook that up like so:

var freq = wordFreq("I am the big the big bull.");
Object.keys(freq).sort().forEach(function(word) {
    console.log("count of " + word + " is " + freq[word]);
});

Which would output:

count of I is 1
count of am is 1
count of big is 2
count of bull is 1
count of the is 2

JSFiddle: http://jsfiddle.net/ah6wsbs6/

And here is wordFreq function in ES6:

function wordFreq(string) {
  return string.replace(/[.]/g, '')
    .split(/\s/)
    .reduce((map, word) =>
      Object.assign(map, {
        [word]: (map[word])
          ? map[word] + 1
          : 1,
      }),
      {}
    );
}

JSFiddle: http://jsfiddle.net/r1Lo79us/

Upvotes: 23

Sampson
Sampson

Reputation: 268364

I feel you have over-complicated things by having multiple arrays, strings, and engaging in frequent (and hard to follow) context-switching between loops, and nested loops.

Below is the approach I would encourage you to consider taking. I've inlined comments to explain each step along the way. If any of this is unclear, please let me know in the comments and I'll revisit to improve clarity.

(function () {

    /* Below is a regular expression that finds alphanumeric characters
       Next is a string that could easily be replaced with a reference to a form control
       Lastly, we have an array that will hold any words matching our pattern */
    var pattern = /\w+/g,
        string = "I I am am am yes yes.",
        matchedWords = string.match( pattern );

    /* The Array.prototype.reduce method assists us in producing a single value from an
       array. In this case, we're going to use it to output an object with results. */
    var counts = matchedWords.reduce(function ( stats, word ) {

        /* `stats` is the object that we'll be building up over time.
           `word` is each individual entry in the `matchedWords` array */
        if ( stats.hasOwnProperty( word ) ) {
            /* `stats` already has an entry for the current `word`.
               As a result, let's increment the count for that `word`. */
            stats[ word ] = stats[ word ] + 1;
        } else {
            /* `stats` does not yet have an entry for the current `word`.
               As a result, let's add a new entry, and set count to 1. */
            stats[ word ] = 1;
        }

        /* Because we are building up `stats` over numerous iterations,
           we need to return it for the next pass to modify it. */
        return stats;

    }, {} );

    /* Now that `counts` has our object, we can log it. */
    console.log( counts );

}());

Upvotes: 19

Anurag Peshne
Anurag Peshne

Reputation: 1547

While both of the answers here are correct maybe are better but none of them address OP's question (what is wrong with the his code).

The problem with OP's code is here:

if(f==0){
    count[i]=1;
    uniqueWords[i]=words[i];
}

On every new word (unique word) the code adds it to uniqueWords at index at which the word was in words. Hence there are gaps in uniqueWords array. This is the reason for some undefined values.

Try printing uniqueWords. It should give something like:

["this", "is", "anil", 4: "kum", 5: "the"]

Note there no element for index 3.

Also the printing of final count should be after processing all the words in the words array.

Here's corrected version:

function search()
{
    var data=document.getElementById('txt').value;
    var temp=data;
    var words=new Array();
    words=temp.split(" ");
    var uniqueWords=new Array();
    var count=new Array();


    for (var i = 0; i < words.length; i++) {
        //var count=0;
        var f=0;
        for(j=0;j<uniqueWords.length;j++){
            if(words[i]==uniqueWords[j]){
                count[j]=count[j]+1;
                //uniqueWords[j]=words[i];
                f=1;
            }
        }
        if(f==0){
            count[i]=1;
            uniqueWords[i]=words[i];
        }
    }
    for ( i = 0; i < uniqueWords.length; i++) {
        if (typeof uniqueWords[i] !== 'undefined')
            console.log("count of "+uniqueWords[i]+" - "+count[i]);       
    }
}

I have just moved the printing of count out of the processing loop into a new loop and added a if not undefined check.

Fiddle: https://jsfiddle.net/cdLgaq3a/

Upvotes: 0

Lucien Stals
Lucien Stals

Reputation: 237

Here is an updated version of your own code...

<!DOCTYPE html>
<html>
<head>
<title>string frequency</title>
<style type="text/css">
#text{
    width:250px;
}
</style>
</head>

<body >

<textarea id="txt" cols="25" rows="3" placeholder="add your text here">   </textarea></br>
<button type="button" onclick="search()">search</button>

    <script >

        function search()
        {
            var data=document.getElementById('txt').value;
            var temp=data;
            var words=new Array();
            words=temp.split(" ");

            var unique = {};


            for (var i = 0; i < words.length; i++) {
                var word = words[i];
                console.log(word);

                if (word in unique)
                {
                    console.log("word found");
                    var count  = unique[word];
                    count ++;
                    unique[word]=count;
                }
                else
                {
                    console.log("word NOT found");
                    unique[word]=1;
                }
            }
            console.log(unique);
        }

    </script>

</body>

I think your loop was overly complicated. Also, trying to produce the final count while still doing your first pass over the array of words is bound to fail because you can't test for uniqueness until you have checked each word in the array.

Instead of all your counters, I've used a Javascript object to work as an associative array, so we can store each unique word, and the count of how many times it occurs.

Then, once we exit the loop, we can see the final result.

Also, this solution uses no regex ;)

I'll also add that it's very hard to count words just based on spaces. In this code, "one, two, one" will results in "one," and "one" as being different, unique words.

Upvotes: 0

Related Questions