user4725754
user4725754

Reputation:

Count recurrent words in two files

I have a code, which can count word occurences in a file. I would like to use this with 2 files and display recurrent(which both files contains) words in a separated table. What is your idea, how is it possible to use it with 2 files?

    while ((inputLine = bufferedReader.readLine()) != null) {
        String[] words = inputLine.split("[ \n\t\r.,;:!?(){}]");

        for (int counter = 0; counter < words.length; counter++) {
            String key = words[counter].toLowerCase();
            if (key.length() > 0) {
                if (crunchifyMap.get(key) == null) {
                    crunchifyMap.put(key, 1);
                } else {
                    int value = crunchifyMap.get(key).intValue();
                    value++;
                    crunchifyMap.put(key, value);
                }
            }
        }
    }
    Set<Map.Entry<String, Integer>> entrySet = crunchifyMap.entrySet();
    System.out.println("Words" + "\t\t" + "# of Occurances");
    for (Map.Entry<String, Integer> entry : entrySet) {
        System.out.println(entry.getKey() + "\t\t" + entry.getValue());
    }

Upvotes: 2

Views: 210

Answers (2)

isnot2bad
isnot2bad

Reputation: 24464

You should probably use the following (very coarse) algorithm:

  1. Read the first file and store all words in a Set words;
  2. Read the second file and store all words in a Set words2;
  3. Compute the intersecting set by retaining all words in words that are also contained in words2: words.retainAll(words2)
  4. words contains your final list.

Note that you can reuse the file-reading algorithm if you put it into a method like

public Set<String> readWords(Reader reader) {
    ....
}

Count frequency of occurence

If you also want to know the frequency of occurence, you should read each file into a Map<String, Integer> which maps each word to its frequency of occurence within that file.

The new Map.merge(...) function (since Java 8) simplifies counting:

Map<String, Integer> freq = new HashMap<>();
for(String word : words) {
    // insert 1 or increment already mapped value
    freq.merge(word, 1, Integer::sum);
}

Then apply the following, slightly modified algorithm:

  1. Read the first file and store all words in a Map wordsFreq1;
  2. Read the second file and store all words in a Map wordsFreq2;
  3. Extract the words from the first map: Set<String> words = wordsFreq1.keySet()
  4. Compute the intersection by retaining all words from the second map: words.retainAll(wordsFreq2.keySet())
  5. Now words contains all the words in common, and wordsFreq1 and wordsFreq2 the frequencies of all words of both files.

With these three data structures, you can easily get all information you want. Example:

    Map<String, Integer> wordsFreq1 = ... // read from file
    Map<String, Integer> wordsFreq2 = ... // read from file

    Set<String> commonWords = new HashSet<>(wordsFreq1.keySet());
    commonWords.retainAll(wordsFreq2.keySet());

    // Map that contains the summarized frequencies of all words
    Map<String, Integer> allWordsTotalFreq = new HashMap<>(wordsFreq1);
    wordsFreq2.forEach((word, freq) -> allWordsTotalFreq.merge(word, freq, Integer::sum));

    // Map that contains the summarized frequencies of words in common
    Map<String, Integer> commonWordsTotalFreq = new HashMap<>(allWordsTotalFreq);
    commonWordsTotalFreq.keySet().retainAll(commonWords);

    // List of common words sorted by frequency:
    List<String> list = new ArrayList<>(commonWords);
    Collections.sort(list, Comparator.comparingInt(commonWordsTotalFreq::get).reversed());

Upvotes: 2

Steve
Steve

Reputation: 633

Use similar code to read the second file and find it in the map of the previous file:

while ((inputLine = bufferedReader.readLine()) != null) {
    String[] words = inputLine.split("[ \n\t\r.,;:!?(){}]");

    for (int counter = 0; counter < words.length; counter++) {
        String key = words[counter].toLowerCase();
        if (key.length() > 0) {
            if (crunchifyMap.get(key) == null) {
                continue;
            } else if(duplicateMap.get(key) == null) {
                duplicateMap.put(key, 1);
            }
        }
    }
}

Set<Map.Entry<String, Integer>> duplicateEntrySet = duplicateMap.entrySet();
System.out.println("Duplicate words:");
    for (Map.Entry<String, Integer> entry : duplicateEntrySet ) {
        System.out.println(entry.getKey()); 
    }
}

Upvotes: 0

Related Questions