Reputation: 2086
I have a dictionary with near around a million words. I have to design algorithm for quick search of sequence of characters.
For ex. if user types and
the app must return words having the sequence like random
,sand
,stand
...etc.
The existing solution I have is to search for matching regex in all the existing words which is not efficient. I am open to restructure existing database, caching of dictionary or work at any level if required Or is there some ready made api in java?
Upvotes: 1
Views: 163
Reputation: 4113
http://lucene.apache.org/core/
Look at this, this should cater to your requirements.
final File INDEX_DIR = new File("index");
try{
Class.forName("com.mysql.jdbc.Driver").newInstance();
Connection conn = DriverManager.getConnection("jdbc:mysql://localhost/test", "root", "password");
StandardAnalyzer analyzer = new StandardAnalyzer();
IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, true);
System.out.println("Indexing to directory '" + INDEX_DIR + "'...");
indexDocs(writer, conn);
writer.optimize();
writer.close();
} catch (Exception e) {
e.printStackTrace();
}
void indexDocs(IndexWriter writer, Connection conn) throws Exception {
String sql = "select id, name, color from pet";
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(sql);
while (rs.next()) {
Document d = new Document();
d.add(new Field("id", rs.getString("id"), Field.Store.YES, Field.Index.NO));
d.add(new Field("name", rs.getString("name"), Field.Store.NO, Field.Index.TOKENIZED));
d.add(new Field("address", rs.getString("address"),Field.Store.NO, Field.Index.TOKENIZED));
writer.addDocument(d);
}
}
Upvotes: 3
Reputation: 3869
I'd try using a trie (Where do I find a standard Trie based map implementation in Java?). Using an in memory lucene index may also fit the bill, depending on your requirements
Upvotes: 1