user1577708
user1577708

Reputation: 169

suitable table for large text storing and counting in java

I want to implement this.I will read a .txt file and then i will convert it from Big letters to small letters in an other .txt file. Then i have to count them so that i find the most common letters or words .My question is What kind of table should i use for that? Should i use a hash or a map table ?The .txt files has about 5000000 letters words sentences.What is the table i have to use if i want to store compare and count and convert large files,so that i can retrieve it fast. I have though a hash table

    HashMap<String, String> hm = new HashMap<String, String>();

or should i do it with some other way? Or should i use linked list ? How can i implement it for Sentences or Words ?

Upvotes: 3

Views: 1065

Answers (4)

David Grant
David Grant

Reputation: 14243

You're going to need a Map for each requirement. For sentences:

Map<String, Integer> sentences = new HashMap<String, Integer>();

For words, the same:

Map<String, Integer> words = new HashMap<String, Integer>();

Finally, for characters, use the following:

Map<Character, Integer> chars = new HashMap<String, Integer>();

HashMap should be the Map implementation you use, since you'll be doing a lot of searching within those maps. The counting process does lend itself well to multiple threads, so you may need a thread-safe Map if you decide on that approach.

Upvotes: 1

Dipak Patil
Dipak Patil

Reputation: 93

I would like to suggest you you can use the database approach and also you can use map to handle insert or update count method.

Map<String, boolean>
.

Also you can use batch processing to handle multiple queries at a time.

Upvotes: 0

SJuan76
SJuan76

Reputation: 24895

If you want to count letter, a Map<Character, Long> or even Map<Character, BigInteger> seems more suitable. The concrete implementation is not that important. If your set of letters is defined and reduced (say the latin alphabet), you can even use an BigInteger[], each letter can easily be replaced by its order in the array.

For sentences or words in these numbers, I would go for a database approach, with a row for each value you want to count.

UPDATE: An alternative approach for words and sentences with data structures could be with a tree. The rood node is the empty word, if you find "dad" from root you get the child "d", its grandchild "a" and its greatgrandchild "d", at this point you add 1 to the pointer of that last node (of course, if any of the nodes is missing you have to create them).

Upvotes: 1

Related Questions