Reputation: 47
My program will read in a paragraph of words (stored in a text file). It will then need to do the following:
Limitations: I cannot use the Collections
class and I cannot store data multiple times. (e.g. Reading in words from the paragraph and storing them into a Set and an ArrayList)
Coding this won't be hard, but I can't figure out what would be the most efficient implementation since the data size could be a few paragraphs from a Wikipedia article or something. Here's my idea for now:
int
to record down the line number. These line numbers will be updated accordingly.This is somewhat incomplete, but it is what I'm thinking for now. The whole 'Word' class may probably be completely unnecessary, too.
Upvotes: 1
Views: 4879
Reputation: 7956
First, you could create a class that holds the data for the occurrences and the row numbers (along with the word). This class could implement the Comparable
interface, providing easy comparisons based on the word frequencies:
public class WordOccurrence implements Comparable<WordOccurrence> {
private final String word;
private int totalCount = 0;
private Set<Integer> lineNumbers = new TreeSet<>();
public WordOccurrence(String word, int firstLineNumber) {
this.word = word;
addOccurrence(firstLineNumber);
}
public final void addOccurrence(int lineNumber) {
totalCount++;
lineNumbers.add(lineNumber);
}
@Override
public int compareTo(WordOccurrence o) {
return totalCount - o.totalCount;
}
@Override
public String toString() {
StringBuilder lineNumberInfo = new StringBuilder("[");
for (int line : lineNumbers) {
if (lineNumberInfo.length() > 1) {
lineNumberInfo.append(", ");
}
lineNumberInfo.append(line);
}
lineNumberInfo.append("]");
return word + ", occurences: " + totalCount + ", on rows "
+ lineNumberInfo.toString();
}
}
When reading the words from the file, it's useful to return the data in a Map<String, WordOccurrence>
, mapping words into WordOccurrence
s. Using a TreeMap
, you'll get alphabetical ordering "for free". Also, you may want to remove punctuation from the lines (e.g. using a regexp like \\p{P}
) and ignore the case of the words:
public TreeMap<String, WordOccurrence> countOccurrences(String filePath)
throws IOException {
TreeMap<String, WordOccurrence> words = new TreeMap<>();
File file = new File(filePath);
BufferedReader reader = new BufferedReader(new InputStreamReader(
new FileInputStream(file)));
String line = null;
int lineNumber = 0;
while ((line = reader.readLine()) != null) {
// remove punctuation and normalize to lower-case
line = line.replaceAll("\\p{P}", "").toLowerCase();
lineNumber++;
String[] tokens = line.split("\\s+");
for (String token : tokens) {
if (words.containsKey(token)) {
words.get(token).addOccurrence(lineNumber);
} else {
words.put(token, new WordOccurrence(token, lineNumber));
}
}
}
return words;
}
Displaying the occurrences in alphabetical order using the above code is as simple as
for (Map.Entry<String, WordOccurrence> entry :
countOccurrences("path/to/file").entrySet()) {
System.out.println(entry.getValue());
}
If you cannot use Collections.sort()
(and a Comparator<WordOccurrence>
) for sorting by occurrences, you need to write the sorting yourself. Something like this should do it:
public static void displayInOrderOfOccurrence(
Map<String, WordOccurrence> words) {
List<WordOccurrence> orderedByOccurrence = new ArrayList<>();
// sort
for (Map.Entry<String, WordOccurrence> entry : words.entrySet()) {
WordOccurrence wo = entry.getValue();
// initialize the list on the first round
if (orderedByOccurrence.isEmpty()) {
orderedByOccurrence.add(wo);
} else {
for (int i = 0; i < orderedByOccurrence.size(); i++) {
if (wo.compareTo(orderedByOccurrence.get(i)) > 0) {
orderedByOccurrence.add(i, wo);
break;
} else if (i == orderedByOccurrence.size() - 1) {
orderedByOccurrence.add(wo);
break;
}
}
}
}
// display
for (WordOccurrence wo : orderedByOccurence) {
System.out.println(wo);
}
}
Running the above code using the following test data:
Potato; orange. Banana; apple, apple; potato. Potato.
will produce this output:
apple, occurrences: 2, on rows [2] banana, occurrences: 1, on rows [2] orange, occurrences: 1, on rows [1] potato, occurrences: 3, on rows [1, 2, 3] potato, occurrences: 3, on rows [1, 2, 3] apple, occurrences: 2, on rows [2] banana, occurrences: 1, on rows [2] orange, occurrences: 1, on rows [1]
Upvotes: 2
Reputation: 2316
you can have a structure like this one : https://gist.github.com/jeorfevre/946ede55ad93cc811cf8
/**
*
* @author Jean-Emmanuel [email protected]
*
*/
public class WordsIndex{
HashMap<String, Word> words = new HashMap<String, Word>();
public static void put(String word, int line, int paragraph){
word=word.toLowerCase();
if(words.containsKey(word)){
Word w=words.get(word);
w.count++;
}else{
//new word
Word w = new Word();
w.count=1;
w.line=line;
w.paragraph=paragraph;
w.word=word;
words.put(word, w);
}
}
}
public class Word{
String word;
int count;
int line;
int paragraph;
}
enjoy
Upvotes: 0
Reputation: 434
You can use a simple TreeMap<String, Integer>
for frequency lookups.
Lookups should be O(1), given that the words are short(i.e what you would find a normal text). If you expect lots of unsuccessful lookups (lots of searches for words that don't exist), you could prefilter using a Bloom Filter.
I'd start with a straightforward implementation, and optimize further if needed (parse the stream directly, instead of splitting each line with a separator and reiterating).
Upvotes: 2
Reputation: 1009
you can use TreeMap it is very suitable for getting the data ordered. use your word as key and the frequency as value. for example let the following is you paragraph
Java is good language Java is object oriented so I will do the following in order to store each word and its frequency
String s = "Java is good language Java is object oriented" ;
String strArr [] = s.split(" ") ;
TreeMap<String, Integer> tm = new TreeMap<String, Integer>();
for(String str : strArr){
if(tm.get(str) == null){
tm.put(str, 1) ;
}else{
int count = tm.get(str) ;
count+=1 ;
}
}
hopefully this will help you
Upvotes: 0