Sankalp
Sankalp

Reputation: 2086

which datastructure Or algorithm to use to arrange data of dictionary for sequence search?

I have a dictionary with near around a million words. I have to design algorithm for quick search of sequence of characters.

For ex. if user types and the app must return words having the sequence like random,sand,stand ...etc.

The existing solution I have is to search for matching regex in all the existing words which is not efficient. I am open to restructure existing database, caching of dictionary or work at any level if required Or is there some ready made api in java?

Upvotes: 1

Views: 163

Answers (2)

Himanshu Bhardwaj
Himanshu Bhardwaj

Reputation: 4113

http://lucene.apache.org/core/

Look at this, this should cater to your requirements.

final File INDEX_DIR = new File("index");  
try{  
    Class.forName("com.mysql.jdbc.Driver").newInstance();  
    Connection conn = DriverManager.getConnection("jdbc:mysql://localhost/test", "root", "password");  
    StandardAnalyzer analyzer = new StandardAnalyzer();  
    IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, true);  
    System.out.println("Indexing to directory '" + INDEX_DIR + "'...");  
    indexDocs(writer, conn);  
    writer.optimize();  
    writer.close();  
} catch (Exception e) {  
    e.printStackTrace();  
}  

void indexDocs(IndexWriter writer, Connection conn) throws Exception {  
String sql = "select id, name, color from pet";  
Statement stmt = conn.createStatement();  
ResultSet rs = stmt.executeQuery(sql);  
while (rs.next()) {  
    Document d = new Document();  
    d.add(new Field("id", rs.getString("id"), Field.Store.YES, Field.Index.NO));  
    d.add(new Field("name", rs.getString("name"), Field.Store.NO,  Field.Index.TOKENIZED));  
    d.add(new Field("address", rs.getString("address"),Field.Store.NO, Field.Index.TOKENIZED));  
    writer.addDocument(d);  
  }  
}  

Upvotes: 3

qwerty
qwerty

Reputation: 3869

I'd try using a trie (Where do I find a standard Trie based map implementation in Java?). Using an in memory lucene index may also fit the bill, depending on your requirements

Upvotes: 1

Related Questions