Reputation: 137
So I'm trying to search through an Arraylist in Java and create a histogram consisting of lengths of string vs frequency that length is present in large text files. I've come up with a brute force algorithm but its much too slow to be of use in large data files. Is there a more efficient way of processing through an Arraylist? I've included the brute force method I came up with.
for (int i = 0; i < (maxLen + 1); i++)
{
int hit = 0;
for (int j = 0; j < list.size(); j++)
{
if (i == list.get(j).length())
++hit;
histogram[i] = hit;
}
}
Upvotes: 1
Views: 494
Reputation: 11592
This is terribly inefficient.
How about instead of looping through each possible length value, then each available word, you simply loop through the available words in the document and count their lengths?
For example:
Map<Integer, Integer> frequencies = new HashMap<Integer, Integer>();
for(int i=0; i<list.size(); i++) {
String thisWord = list.get(i);
Integer theLength = (Integer)(thisWord.length());
if(frequencies.containsKey(theLength) {
frequencies.put(theLength, new Integer(frequencies.get(theLength).intValue()+1));
}
else {
frequencies.put(theLength, new Integer(1));
}
}
Then, if the key does not exist in the HashMap
, you know no words of that length exist in the document. If the key does exist, you can look up exactly how many times that occurred.
Note: Some aspects of this code example were made in order to prevent any additional confusion about boxing and unboxing. It is possible to write it slightly cleaner, and I would certainly do so in a production environment. Also, it assumes that you don't have knowledge of any minimum or maximum lengths of words (and is thus slightly more flexible, scalable, and catch-all). Otherwise, the other techniques for simply declaring a primitive array will work just as well (see Jon Skeet's answer).
For a cleaner version that takes advantage of autoboxing:
Map<Integer, Integer> frequencies = new HashMap<Integer, Integer>();
for(int i=0; i<list.size(); i++) {
String thisWord = list.get(i);
if(frequencies.containsKey(thisWord.length()) {
frequencies.put(thisWord.length(), frequencies.get(thisWord.length())+1);
}
else {
frequencies.put(thisWord.length(), 1);
}
}
Upvotes: 2
Reputation: 1502016
Why don't you just loop over the list once?
int[] histogram = new int[maxLen + 1]; // All entries will be 0 to start with
for (String text : list) {
if (text.length() <= maxLen) {
histogram[text.length()]++;
}
}
This is now just O(N).
Upvotes: 1