Count recurrent words in two files

Question

I have a code, which can count word occurences in a file. I would like to use this with 2 files and display recurrent(which both files contains) words in a separated table. What is your idea, how is it possible to use it with 2 files?

    while ((inputLine = bufferedReader.readLine()) != null) {
        String[] words = inputLine.split("[ 
	
.,;:!?(){}]");

        for (int counter = 0; counter < words.length; counter++) {
            String key = words[counter].toLowerCase();
            if (key.length() > 0) {
                if (crunchifyMap.get(key) == null) {
                    crunchifyMap.put(key, 1);
                } else {
                    int value = crunchifyMap.get(key).intValue();
                    value++;
                    crunchifyMap.put(key, value);
                }
            }
        }
    }
    Set> entrySet = crunchifyMap.entrySet();
    System.out.println("Words" + "		" + "# of Occurances");
    for (Map.Entry entry : entrySet) {
        System.out.println(entry.getKey() + "		" + entry.getValue());
    }

isnot2bad · Accepted Answer

You should probably use the following (very coarse) algorithm:

Read the first file and store all words in a Set words;
Read the second file and store all words in a Set words2;
Compute the intersecting set by retaining all words in words that are also contained in words2: words.retainAll(words2)
words contains your final list.

Note that you can reuse the file-reading algorithm if you put it into a method like

public Set readWords(Reader reader) {
    ....
}

Count frequency of occurence

If you also want to know the frequency of occurence, you should read each file into a Map which maps each word to its frequency of occurence within that file.

The new Map.merge(...) function (since Java 8) simplifies counting:

Map freq = new HashMap<>();
for(String word : words) {
    // insert 1 or increment already mapped value
    freq.merge(word, 1, Integer::sum);
}

Then apply the following, slightly modified algorithm:

Read the first file and store all words in a Map wordsFreq1;
Read the second file and store all words in a Map wordsFreq2;
Extract the words from the first map: Set words = wordsFreq1.keySet()
Compute the intersection by retaining all words from the second map: words.retainAll(wordsFreq2.keySet())
Now words contains all the words in common, and wordsFreq1 and wordsFreq2 the frequencies of all words of both files.

With these three data structures, you can easily get all information you want. Example:

    Map wordsFreq1 = ... // read from file
    Map wordsFreq2 = ... // read from file

    Set commonWords = new HashSet<>(wordsFreq1.keySet());
    commonWords.retainAll(wordsFreq2.keySet());

    // Map that contains the summarized frequencies of all words
    Map allWordsTotalFreq = new HashMap<>(wordsFreq1);
    wordsFreq2.forEach((word, freq) -> allWordsTotalFreq.merge(word, freq, Integer::sum));

    // Map that contains the summarized frequencies of words in common
    Map commonWordsTotalFreq = new HashMap<>(allWordsTotalFreq);
    commonWordsTotalFreq.keySet().retainAll(commonWords);

    // List of common words sorted by frequency:
    List list = new ArrayList<>(commonWords);
    Collections.sort(list, Comparator.comparingInt(commonWordsTotalFreq::get).reversed());

Count recurrent words in two files

Answers (2)

Related Questions