Reputation: 63
I have created two HashMaps that contain strings from two separate txt files.
Now, I am trying to compare the two HashMaps and count the number of duplicate values that each file contains. For example, if file1 and file2 both contain the string "hello" twice, my console should print: hello occurs 2 times.
Here is my first HashMap:
List<String> word_list = new ArrayList<>();
//Load your words to the word_list here
while (INPUT_TEXT1.hasNext()) {
String input_word = INPUT_TEXT1.next();
word_list.add(input_word);
}
INPUT_TEXT1.close();
String regexPattern = "[^a-zA-Z]";
int index = 0;
for (String s : word_list) {
word_list.set(index++, s.replaceAll(regexPattern, "").toLowerCase());
}
//Find the unique words now from list
String[] uniqueWords = word_list.stream().distinct().
toArray(size -> new String[size]);
Map<String, Integer> wordsMap = new HashMap<>();
int frequency = 0;
//Load the words to Map with each uniqueword as Key and frequency as Value
for (String uniqueWord : uniqueWords) {
frequency = Collections.frequency(word_list, uniqueWord);
System.out.println(uniqueWord+" occured "+frequency+" times");
wordsMap.put(uniqueWord, frequency);
}
//Now, Sort the words with the reverse order of frequency(value of HashMap)
Stream<Entry<String, Integer>> topWords = wordsMap.entrySet().stream().
sorted(Map.Entry.<String,Integer>comparingByValue().reversed()).limit(6);
//Now print the Top 5 words to console
System.out.println("Top 5 Words:::");
topWords.forEach(System.out::println);
System.out.println("\n\n");
Here is my second HashMap:
List<String> wordList = new ArrayList<>();
//Load your words to the word_list here
while (INPUT_TEXT2.hasNext()) {
String input_word1 = INPUT_TEXT2.next();
wordList.add(input_word1);
}
INPUT_TEXT2.close();
String regex = "[^a-zA-Z]";
int index1 = 0;
for (String s : wordList) {
wordList.set(index1++, s.replaceAll(regex, "").toLowerCase());
}
String[] uniqueWords1 = wordList.stream().distinct().
toArray(size -> new String[size]);
Map<String, Integer> wordsMap1 = new HashMap<>();
//Load the words to Map with each uniqueword as Key and frequency as Value
for (String uniqueWord : uniqueWords1) {
frequency = Collections.frequency(wordList, uniqueWord);
System.out.println(uniqueWord+" occured "+frequency+" times");
wordsMap.put(uniqueWord, frequency);
}
//Now, Sort the words with the reverse order of frequency(value of HashMap)
Stream<Entry<String, Integer>> topWords1 = wordsMap1.entrySet().stream().
sorted(Map.Entry.<String,Integer>comparingByValue().reversed()).limit(6)
Here is my original approach to finding the duplicate values:
boolean val = wordsMap.keySet().containsAll(wordsMap1.keySet());
for (Entry<String, Integer> str : wordsMap.entrySet()) {
System.out.println("================= " + str.getKey());
if(wordsMap1.containsKey(str.getKey())){
System.out.println("Map2 Contains Map 1 Key");
}
}
System.out.println("================= " + val);
Does anyone have any other suggestions for achieving this? Thank you
EDIT How could I go about counting the number of occurrences of each individual value?
Upvotes: 1
Views: 1143
Reputation: 1938
I think your code works as well. If your target is to find a better way to implement the last check, you could try this:
Set<String> keySetMap1 = new HashSet<String>(wordsMap.keySet());
Set<String> keySet2 = wordsMap1.keySet();
keySetMap1.retainAll(keySet2);
keySetMap1.stream().forEach(x -> System.out.println("Map2 Contains Map 1 Key: "+x));
Upvotes: 3