Reputation: 169
I want to implement this.I will read a .txt file and then i will convert it from Big letters to small letters in an other .txt file. Then i have to count them so that i find the most common letters or words .My question is What kind of table should i use for that? Should i use a hash or a map table ?The .txt files has about 5000000 letters words sentences.What is the table i have to use if i want to store compare and count and convert large files,so that i can retrieve it fast. I have though a hash table
HashMap<String, String> hm = new HashMap<String, String>();
or should i do it with some other way? Or should i use linked list ? How can i implement it for Sentences or Words ?
Upvotes: 3
Views: 1065
Reputation: 194
Have a look here http://www.ntu.edu.sg/home/ehchua/programming/java/J5c_Collection.html#zz-2.6
and here http://www.ntu.edu.sg/home/ehchua/programming/java/J5c_Collection.html#zz-4.
and here http://www.javapractices.com/topic/TopicAction.do?Id=65 and the bset in my opinion is here http://www.javamex.com/tutorials/collections/how_to_choose.shtml.
Have fun
Upvotes: 1
Reputation: 14243
You're going to need a Map
for each requirement. For sentences:
Map<String, Integer> sentences = new HashMap<String, Integer>();
For words, the same:
Map<String, Integer> words = new HashMap<String, Integer>();
Finally, for characters, use the following:
Map<Character, Integer> chars = new HashMap<String, Integer>();
HashMap
should be the Map
implementation you use, since you'll be doing a lot of searching within those maps. The counting process does lend itself well to multiple threads, so you may need a thread-safe Map
if you decide on that approach.
Upvotes: 1
Reputation: 93
I would like to suggest you you can use the database approach and also you can use map to handle insert or update count method.
Map<String, boolean>
.
Also you can use batch processing to handle multiple queries at a time.
Upvotes: 0
Reputation: 24895
If you want to count letter, a Map<Character, Long>
or even Map<Character, BigInteger>
seems more suitable. The concrete implementation is not that important. If your set of letters is defined and reduced (say the latin alphabet), you can even use an BigInteger[]
, each letter can easily be replaced by its order in the array.
For sentences or words in these numbers, I would go for a database approach, with a row for each value you want to count.
UPDATE: An alternative approach for words and sentences with data structures could be with a tree. The rood node is the empty word, if you find "dad" from root you get the child "d", its grandchild "a" and its greatgrandchild "d", at this point you add 1 to the pointer of that last node (of course, if any of the nodes is missing you have to create them).
Upvotes: 1