Reputation: 3
I have two classes claim and index. i have a field in my claim class called topic which is a string. I m trying to index the topic column not using database index column features. But it should by coding the following method. Suppose i have claim 1, for claim 1 topic field ("i love muffins muffins") i ll do the folowing treatment
#1. Create an empty Dictionary with "word"=>occurrences
#2. Create a List of the stopwords exemple stopwords = ("For","This".....etc )
#3. Create List of the delimiters exemple delimiter_chars = ",.;:!?"
#4. Split the Text(topic field) into words delimited by whitespace.
#5. Remove unwanted delimiter characters adjoining words.
#6. Remove stopwords.
#7. Remove Duplicate
#8. now i create multiple index object (word="love",occurences = 1,looked = 0,reference on claim 1),(word="muffins",occurences = 2,looked = 0,reference on claim 1),
now whenever i look the word muffins for exemple looked will increase by one and i will move the record up in my database. So my question is the following is this method good ? is it better than database index features ? is there someways to improve things ?
Upvotes: 0
Views: 82
Reputation: 1439
What I think you are looking for is something called a B-Tree. In your case, you would use a 26 (or 54 if you need case sensitivity) branch node in the tree. This will make finding objects very fast. I think the time is nlogn or something. In the node, you would have a pointer to the actual data in an array, list, file, or something else.
However, unless you are willing to put the time in to code something specific for your application, you might be better off using a database such as Oracle, Microsoft SQL Server, or MySQL because these are professionally developed and profiled to get the maximum performance possible.
Upvotes: 1