MADA
MADA

Reputation: 21

What type of data structure for counting word pairs in Java?

I am attempting to count word pairs in a text file. My goal is to map every word in the string to the word that follows it, and then count the duplicate key/value pairs. I am not concerned with the order. My code currently is using HashMap to store each word pair but using HashMap I lose duplicate entries. If my text file contains: "FIRST SECOND THIRD FIRST SECOND" I will get for my output: FIRST [SECOND] SECOND[] THIRD [FIRST]. So if I have a duplicate key the following string value overwrites what was previously there. Brandon Ling helped me earlier on a previous post, however I was not clear with him on my goal. I am now finally realizing HashMap may not work.
Any help would be appreciated.

 import java.io.File;
 import java.io.FileInputStream;
 import java.io.InputStream;
 import java.io.FileNotFoundException;
 import java.util.Iterator;
 import java.util.Scanner;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.Map.Entry;
 import java.util.ArrayList;
 import java.util.Set;
 import java.util.TreeMap;



 public class Assignment1
 {
     // returns an InputStream that gets data from the named file
     private static InputStream getFileInputStream(String fileName)
     {
     InputStream inputStream;

     try {
         inputStream = new FileInputStream(new File(fileName));
     }
     catch (FileNotFoundException e) {       // no file with this name exists
         System.err.println(e.getMessage());
         inputStream = null;
     }
     return inputStream;
     }

    // @SuppressWarnings("unchecked")
     public static void main(String[] args)
     {


     InputStream in = System.in;

         in = getFileInputStream(args[0]);
         System.out.println("number of words is" + in);


     if (in != null) 
     {

         // Using a Scanner object to read one word at a time from the input   stream.

         @SuppressWarnings("resource")
         Scanner sc = new Scanner(in);   
         String word;

         System.out.println("CS261 - Assignment 1 -AdamDavis%n%n");
         System.out.println("");
         System.out.println("");

         // Continue getting words until we reach the end of input 
         List<String> inputWords = new ArrayList<String>();
         HashMap<String, List<String>> wordPairs = new HashMap<String,     List<String>>();

         while (sc.hasNext()) 
         {  
         word = sc.next();       
         if (!word.equals(null)) 
         {

             inputWords.add(word);

             System.out.println("");
             System.out.println("");
        }
       }

         Iterator<String> it = inputWords.iterator();
         boolean firstWord = true;
         String currentWord = null;
         String previousWord = null;


         while(it.hasNext())
             {
                 currentWord = it.next();
                wordPairs.put(currentWord, new ArrayList<String>());
                 if(firstWord == true)
                 {
                    //System.out.println("this is result inside if first ==   null:" + wordPairs.containsKey(currentWord));
                     firstWord = false;
                  }
                 else
                 {
                 // System.out.println("this is result inside else:" + currentWord);
                   wordPairs.get(previousWord).add(currentWord);
                  //System.out.println("this is result inside else:" +  wordPairs.containsKey(previousWord));

                 }

                     previousWord = currentWord;

                  }


             {
                 Entry<String, List<String>> Pairs = iter.next();
                 System.out.println("this is the key in pairs: " +Pairs.getKey());

                  Pairs.getValue();
                  System.out.println("this is the key in pairs: " +Pairs.getValue());

                  int count = 0;
                  if(iter.hasNext())
                  {

                      count ++;

             }

        Set<Entry<String, List<String>>> s = wordPairs.entrySet();
        Iterator<Entry<String, List<String>>> itr=s.iterator();

     while(itr.hasNext())
    {
        Entry<String, List<String>> Pairs = itr.next();
        System.out.println(Pairs.getKey()+"\t"+Pairs.getValue());
    }
}


}
}

Upvotes: 2

Views: 1480

Answers (3)

Tesseract
Tesseract

Reputation: 8139

You can use Java 8 streams to create a HashMap with the word pair count in it.

import java.util.Arrays;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.nio.file.Files;
import java.nio.file.FileSystems;
import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.counting;


public class Words {
  public static void main(String[] args) throws Exception {
    String fileContent = new String(Files.readAllBytes(FileSystems.getDefault().getPath(args[0])));
    String[] inputWords = fileContent.split("\\s+");
    System.out.println("number of words is " + inputWords.length);

    List<List<String>> wordPairs = new ArrayList<>();

    String previousWord = null;
    for(String word: inputWords) {
      if(previousWord != null) wordPairs.add(Arrays.asList(previousWord, word));
      previousWord = word;
    }

    Map<List<String>, Long> pairCounts = wordPairs.stream().collect(groupingBy(pair -> pair, counting()));
    System.out.println(pairCounts);
  }
}

Upvotes: 1

Himanshu Ahire
Himanshu Ahire

Reputation: 717

You can use apache commons org.apache.commons.collections.map.MultiKeyMap which allows you to store more than one key, then just add integer as value to maintain counter.

    MultiKeyMap map = new MultiKeyMap();
    Integer counter = new Integer(1);
    map.put("String1","String2",counter);
    Integer value = (Integer)map.get("String1", "String2");

Or you can create combine key for map like. Word1+word2. Then use integer for continuing

    Map<String,Integer> map = new HashMap<>();

    String key = "word1" + "|" + "word2";

    Integer value = new Integer(1);

    map.put(key,value);
    Integer cntr = map.get(key);

Upvotes: 1

scadge
scadge

Reputation: 9753

I would do something like the following:

  • select some delimiter like #
  • save each pair with a counter in a map, e.g. FIRST#SECOND -> 2, SECOND#THIRD -> 1

Code:

Map<String, Integer> pairsCount = new HashMap<>();
Iterator<String> it = inputWords.iterator();   
String currentWord = null;
String previousWord = null;
while( it.hasNext() ) {
  currentWord = it.next();
  if( previousWord != null ) {
    String key = previousWord.concat( "#" ).concat( currentWord );
    if( pairsCount.containsKey( key ) ) {
      Integer lastCount = pairsCount.get( key );
      pairsCount.put( key, lastCount + 1 );
    } else {
      pairsCount.put( key, 1 );
    }
  }
  previousWord = currentWord;
}

// output all pairs with count
for( Map.Entry<String, Integer> entry : pairsCount.entrySet() )
  System.out.printf( "%s %s -> %d", entry.getKey().split( "#" )[0], entry.getKey().split( "#" )[1], entry.getValue() );

Upvotes: 0

Related Questions