user1405298
user1405298

Reputation:

How to Count Unique Values in an ArrayList?

I have to count the number of unique words from a text document using Java. First I had to get rid of the punctuation in all of the words. I used the Scanner class to scan each word in the document and put in an String ArrayList.

So, the next step is where I'm having the problem! How do I create a method that can count the number of unique Strings in the array?

For example, if the array contains apple, bob, apple, jim, bob; the number of unique values in this array is 3.


public countWords() {
    try {
        Scanner scan = new Scanner(in);
        while (scan.hasNext()) {
            String words = scan.next();
            if (words.contains(".")) {
                words.replace(".", "");
            }
            if (words.contains("!")) {
                words.replace("!", "");
            }
            if (words.contains(":")) {
                words.replace(":", "");
            }
            if (words.contains(",")) {
                words.replace(",", "");
            }
            if (words.contains("'")) {
                words.replace("?", "");
            }
            if (words.contains("-")) {
                words.replace("-", "");
            }
            if (words.contains("‘")) {
                words.replace("‘", "");
            }
            wordStore.add(words.toLowerCase());
        }
    } catch (FileNotFoundException e) {
        System.out.println("File Not Found");
    }
    System.out.println("The total number of words is: " + wordStore.size());
}

Upvotes: 9

Views: 57105

Answers (9)

ChandraBhan Singh
ChandraBhan Singh

Reputation: 2971

3 distinct possible solutions:

  1. Use HashSet as suggested above.

  2. Create a temporary ArrayList and store only unique element like below:

    public static int getUniqueElement(List<String> data) {
        List<String> newList = new ArrayList<>();
        for (String eachWord : data)
        if (!newList.contains(eachWord))
            newList.add(eachWord);
        return newList.size();
    }
    
  3. Java 8 solution

    long count = data.stream().distinct().count();
    

Upvotes: 0

Casmon Gordon
Casmon Gordon

Reputation: 21

This general purpose solution takes advantage of the fact that the Set abstract data type does not allow duplicates. The Set.add() method is specifically useful in that it returns a boolean flag indicating the success of the 'add' operation. A HashMap is used to track the occurrence of each original element. This algorithm can be adapted for variations of this type of problem. This solution produces O(n) performance..

public static void main(String args[])
{
  String[] strArray = {"abc", "def", "mno", "xyz", "pqr", "xyz", "def"};
  System.out.printf("RAW: %s ; PROCESSED: %s \n",Arrays.toString(strArray), duplicates(strArray).toString());
}

public static HashMap<String, Integer> duplicates(String arr[])
{

    HashSet<String> distinctKeySet = new HashSet<String>();
    HashMap<String, Integer> keyCountMap = new HashMap<String, Integer>();

    for(int i = 0; i < arr.length; i++)
    {
        if(distinctKeySet.add(arr[i]))
            keyCountMap.put(arr[i], 1); // unique value or first occurrence
        else
            keyCountMap.put(arr[i], (Integer)(keyCountMap.get(arr[i])) + 1);
    }     

    return keyCountMap; 
} 

RESULTS:

RAW: [abc, def, mno, xyz, pqr, xyz, def] ; PROCESSED: {pqr=1, abc=1, def=2, xyz=2, mno=1}

Upvotes: 1

S N Prasad Rao
S N Prasad Rao

Reputation: 103

public class UniqueinArrayList {

    public static void main(String[] args) { 
        StringBuffer sb=new StringBuffer();
        List al=new ArrayList();
        al.add("Stack");
        al.add("Stack");
        al.add("over");
        al.add("over");
        al.add("flow");
        al.add("flow");
        System.out.println(al);
        Set s=new LinkedHashSet(al);
        System.out.println(s);
        Iterator itr=s.iterator();
        while(itr.hasNext()){
            sb.append(itr.next()+" ");
        }
        System.out.println(sb.toString().trim());
    }

}

Upvotes: 0

ROMANIA_engineer
ROMANIA_engineer

Reputation: 56616

Starting from Java 8 you can use Stream:

After you add the elements in your ArrayList:

long n = wordStore.stream().distinct().count();

It converts your ArrayList to a stream and then it counts only the distinct elements.

Upvotes: 19

Eric B.
Eric B.

Reputation: 24411

Although I believe a set is the easiest solution, you can still use your original solution and just add an if statement to check if value already exists in the list before you do your add.

if( !wordstore.contains( words.toLowerCase() )
   wordStore.add(words.toLowerCase());

Then the number of words in your list is the total number of unique words (ie: wordStore.size() )

Upvotes: 2

namalfernandolk
namalfernandolk

Reputation: 9134

In shorthand way you can do it as follows...

    ArrayList<String> duplicateList = new ArrayList<String>();
    duplicateList.add("one");
    duplicateList.add("two");
    duplicateList.add("one");
    duplicateList.add("three");

    System.out.println(duplicateList); // prints [one, two, one, three]

    HashSet<String> uniqueSet = new HashSet<String>();

    uniqueSet.addAll(duplicateList);
    System.out.println(uniqueSet); // prints [two, one, three]

    duplicateList.clear();
    System.out.println(duplicateList);// prints []


    duplicateList.addAll(uniqueSet);
    System.out.println(duplicateList);// prints [two, one, three]

Upvotes: 0

Yogendra Singh
Yogendra Singh

Reputation: 34367

I would advice to use HashSet. This automatically filters the duplicate when calling add method.

Upvotes: 3

FSP
FSP

Reputation: 4827

You can create a HashTable or HashMap as well. Keys would be your input strings and Value would be the number of times that string occurs in your input array. O(N) time and space.

Solution 2:

Sort the input list. Similar strings would be next to each other. Compare list(i) to list(i+1) and count the number of duplicates.

Upvotes: 0

kosa
kosa

Reputation: 66637

Are you allowed to use Set? If so, you HashSet may solve your problem. HashSet doesn't accept duplicates.

HashSet noDupSet = new HashSet();
noDupSet.add(yourString);
noDupSet.size();

size() method returns number of unique words.

If you have to really use ArrayList only, then one way to achieve may be,

1) Create a temp ArrayList
2) Iterate original list and retrieve element
3) If tempArrayList doesn't contain element, add element to tempArrayList

Upvotes: 25

Related Questions