Reputation:
I have to count the number of unique words from a text document using Java. First I had to get rid of the punctuation in all of the words. I used the Scanner
class to scan each word in the document and put in an String ArrayList
.
So, the next step is where I'm having the problem! How do I create a method that can count the number of unique Strings in the array?
For example, if the array contains apple, bob, apple, jim, bob; the number of unique values in this array is 3.
public countWords() {
try {
Scanner scan = new Scanner(in);
while (scan.hasNext()) {
String words = scan.next();
if (words.contains(".")) {
words.replace(".", "");
}
if (words.contains("!")) {
words.replace("!", "");
}
if (words.contains(":")) {
words.replace(":", "");
}
if (words.contains(",")) {
words.replace(",", "");
}
if (words.contains("'")) {
words.replace("?", "");
}
if (words.contains("-")) {
words.replace("-", "");
}
if (words.contains("‘")) {
words.replace("‘", "");
}
wordStore.add(words.toLowerCase());
}
} catch (FileNotFoundException e) {
System.out.println("File Not Found");
}
System.out.println("The total number of words is: " + wordStore.size());
}
Upvotes: 9
Views: 57105
Reputation: 2971
3 distinct possible solutions:
Use HashSet as suggested above.
Create a temporary ArrayList
and store only unique element like below:
public static int getUniqueElement(List<String> data) {
List<String> newList = new ArrayList<>();
for (String eachWord : data)
if (!newList.contains(eachWord))
newList.add(eachWord);
return newList.size();
}
Java 8 solution
long count = data.stream().distinct().count();
Upvotes: 0
Reputation: 21
This general purpose solution takes advantage of the fact that the Set abstract data type does not allow duplicates. The Set.add() method is specifically useful in that it returns a boolean flag indicating the success of the 'add' operation. A HashMap is used to track the occurrence of each original element. This algorithm can be adapted for variations of this type of problem. This solution produces O(n) performance..
public static void main(String args[])
{
String[] strArray = {"abc", "def", "mno", "xyz", "pqr", "xyz", "def"};
System.out.printf("RAW: %s ; PROCESSED: %s \n",Arrays.toString(strArray), duplicates(strArray).toString());
}
public static HashMap<String, Integer> duplicates(String arr[])
{
HashSet<String> distinctKeySet = new HashSet<String>();
HashMap<String, Integer> keyCountMap = new HashMap<String, Integer>();
for(int i = 0; i < arr.length; i++)
{
if(distinctKeySet.add(arr[i]))
keyCountMap.put(arr[i], 1); // unique value or first occurrence
else
keyCountMap.put(arr[i], (Integer)(keyCountMap.get(arr[i])) + 1);
}
return keyCountMap;
}
RESULTS:
RAW: [abc, def, mno, xyz, pqr, xyz, def] ; PROCESSED: {pqr=1, abc=1, def=2, xyz=2, mno=1}
Upvotes: 1
Reputation: 103
public class UniqueinArrayList {
public static void main(String[] args) {
StringBuffer sb=new StringBuffer();
List al=new ArrayList();
al.add("Stack");
al.add("Stack");
al.add("over");
al.add("over");
al.add("flow");
al.add("flow");
System.out.println(al);
Set s=new LinkedHashSet(al);
System.out.println(s);
Iterator itr=s.iterator();
while(itr.hasNext()){
sb.append(itr.next()+" ");
}
System.out.println(sb.toString().trim());
}
}
Upvotes: 0
Reputation: 56616
Starting from Java 8 you can use Stream
:
After you add the elements in your ArrayList
:
long n = wordStore.stream().distinct().count();
It converts your ArrayList
to a stream and then it counts only the distinct elements.
Upvotes: 19
Reputation: 24411
Although I believe a set is the easiest solution, you can still use your original solution and just add an if statement to check if value already exists in the list before you do your add.
if( !wordstore.contains( words.toLowerCase() )
wordStore.add(words.toLowerCase());
Then the number of words in your list is the total number of unique words (ie: wordStore.size() )
Upvotes: 2
Reputation: 9134
In shorthand way you can do it as follows...
ArrayList<String> duplicateList = new ArrayList<String>();
duplicateList.add("one");
duplicateList.add("two");
duplicateList.add("one");
duplicateList.add("three");
System.out.println(duplicateList); // prints [one, two, one, three]
HashSet<String> uniqueSet = new HashSet<String>();
uniqueSet.addAll(duplicateList);
System.out.println(uniqueSet); // prints [two, one, three]
duplicateList.clear();
System.out.println(duplicateList);// prints []
duplicateList.addAll(uniqueSet);
System.out.println(duplicateList);// prints [two, one, three]
Upvotes: 0
Reputation: 34367
I would advice to use HashSet. This automatically filters the duplicate when calling add
method.
Upvotes: 3
Reputation: 4827
You can create a HashTable or HashMap as well. Keys would be your input strings and Value would be the number of times that string occurs in your input array. O(N) time and space.
Solution 2:
Sort the input list. Similar strings would be next to each other. Compare list(i) to list(i+1) and count the number of duplicates.
Upvotes: 0
Reputation: 66637
Are you allowed to use Set? If so, you HashSet may solve your problem. HashSet
doesn't accept duplicates.
HashSet noDupSet = new HashSet();
noDupSet.add(yourString);
noDupSet.size();
size()
method returns number of unique words.
If you have to really use ArrayList
only, then one way to achieve may be,
1) Create a temp ArrayList
2) Iterate original list and retrieve element
3) If tempArrayList doesn't contain element, add element to tempArrayList
Upvotes: 25