Reputation: 374
I'm using Lucene for querying a website's database but i'm experiencing some problems. I don't actually know if the problems come from indexing or searching (more precisely the construction of queries). Well, as far as i'm aware, when searching in several SQL database tables its better to use more than one document for each table (i followed these tutorials:
http://kalanir.blogspot.pt/2008/06/indexing-database-using-apache-lucene.html
http://www.lucenetutorial.com/techniques/indexing-databases.html
) which are close to what i want to do. In fact, in my case i have to search in 3 tables which are all related because each one specifies the above level (e.g.: product -> type -> color). Thus, my indexing was something like this:
String sql = "select c.idConteudo as ID, c.designacao as DESIGNACAO, cd.texto as DESCRICAO, ctf.webTag as TAG from Conteudo c, ConteudoDetalhe cd, ConteudoTipoFormato ctf where c.idConteudo = cd.idConteudo AND cd.idConteudoTipoFormato = ctf.idConteudoTipoFormato;";
Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery(sql);
Document document;
while (rs.next())
{
String S = new String();
S += IndexerCounter;
document = new Document();
document.add(new Field("ID_ID",S, Field.Store.YES, Field.Index.NO));
document.add(new Field("ID CONTEUDO", rs.getString("ID"), Field.Store.YES, Field.Index.NO));
document.add(new Field("DESIGNACAO", rs.getString("DESIGNACAO"), Field.Store.NO, Field.Index.TOKENIZED));
document.add(new Field("DESCRICAO", rs.getString("DESCRICAO"), Field.Store.NO, Field.Index.TOKENIZED));
document.add(new Field("TAG", rs.getString("TAG"), Field.Store.NO, Field.Index.TOKENIZED));
try{
writer.addDocument(document);
}catch(CorruptIndexException e){
}catch(IOException e){
}catch(Exception e){ } //just for knowing if something is wrong
IndexerCounter++;
}
If i output the results they are something like this:
ID: idConteudo: designacao: texto: webTag
1:1:Xor:xor 1 Descricao:x or
2:1:Xor:xor 2 Descricao:xis Or
3:1:Xor:xor 3 Descricao:exor
4:2:And:and 1 Descricao:and
5:2:And:and 2 Descricao:&
6:2:And:and 3 Descricao:ande
7:2:And:and 4 Descricao:a n d
8:2:And:and 5 Descricao:and,
9:3:Nor:nor 1 Descricao:nor
10:3:Nor:nor 2 Descricao:not or
What i really want is to make a query for (for example Xor) and search it in the created documents for it. Thus my searching method is something like this:
Constructor:
public Spider(String Query, String Pathh) {
String[] Q;
QueryFromUser = new String();
QueryFromUser = Query;
QueryToSearch1 = new String();
QueryToSearch2 = new String();
Path = Pathh;
try {
try {
Class.forName("com.mysql.jdbc.Driver");
} catch (ClassNotFoundException e) {
e.printStackTrace();
return;
}
try {
connection = DriverManager.getConnection("jdbc:mysql://localhost:3306/mydb", "root", "");
} catch (SQLException e) {
e.printStackTrace();
return;
}
Q = Query.split(" ");
//NOTE: the AND word enables the search engine to search by the various words in a query
for (int i = 0; i < Q.length; i++) {
if ((Q.length - i) > 1) //prevents the last one to take a AND
{
QueryToSearch1 += Q[i] + " AND ";
} else {
QueryToSearch1 += Q[i];
}
}
for (int i = 0; i < Q.length; i++) {
QueryToSearch2 += "+" + Q[i];
}
try {
SEARCHING_CONTENT();
} catch (ClassNotFoundException ex) {
Logger.getLogger(Spider.class.getName()).log(Level.SEVERE, null, ex);
} catch (InstantiationException ex) {
Logger.getLogger(Spider.class.getName()).log(Level.SEVERE, null, ex);
} catch (IllegalAccessException ex) {
Logger.getLogger(Spider.class.getName()).log(Level.SEVERE, null, ex);
} catch (SQLException ex) {
Logger.getLogger(Spider.class.getName()).log(Level.SEVERE, null, ex);
} catch (ParseException ex) {
Logger.getLogger(Spider.class.getName()).log(Level.SEVERE, null, ex);
}
SEARCHING_WEB(); //not for using now
} catch (CorruptIndexException ex) {
Logger.getLogger(Spider.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(Spider.class.getName()).log(Level.SEVERE, null, ex);
}
The idea is that QueryToSearch1 and QueryToSearch2 has the commands (i saw it on an online tutorial, don't quite remember where) AND and +. Thus, to a query "not or" from the user, what will be searched it will be "not AND or" for searching for the two words simultaneously and "+not+or" for searching the two words separetly. This is one of my doubts, i don't really know if the construction of lucene queries are like this. The fact is that, in the method Querying:
private void SEARCHING_CONTENT() throws CorruptIndexException, IOException, ClassNotFoundException, InstantiationException, IllegalAccessException, SQLException, ParseException {
Querying(QueryToSearch1); // search for the whole phrase
Querying(QueryToSearch2); //search by individual words
//Querying(QueryFromUser); //search by individual words
}
private void Querying(String QueryS) throws CorruptIndexException, IOException, ClassNotFoundException, InstantiationException, IllegalAccessException, SQLException, ParseException {
searcher = new IndexSearcher(IndexReader.open(Path + "/INDEX_CONTENTS"));
query = new QueryParser("TAG", new StopWords()).parse(QueryS);
query.toString();
hits = searcher.search(query);
pstmt = connection.prepareStatement(sql);
for (int i = 0; i < hits.length(); i++) {
id = hits.doc(i).get("TAG");
pstmt.setString(1, id);
displayResults(pstmt);
}
}
there are no hits on the documents for the query. It is important to say that in the following line:
query = new QueryParser("TAG", new StopWords()).parse(QueryS);
the StopWords
is a class i made that extents StandardAnalyser but its a new class with words i specified (for NOT removing important on my search words like or or and - in this case those words may be important).
The problem is, as i told. There are no hits when the search is performed. I'm not sure if this is because of the indexing or because of the construction of the queries to search (if the queries are bad constructed, thence, there are no hits).
I would apreciatte any help from anyone. I would gladly provide more information if needed.
Thanks a lot.
Upvotes: 1
Views: 7091
Reputation: 9320
Easy first move for you - use Luke (https://code.google.com/p/luke/) to look on your index. You could run your queries from Luke to check, do they find something, or not.
Luke is pretty easy to understand, since it have very usefull UI (https://code.google.com/p/luke/source/browse/wiki/img/overview.png)
Upvotes: 3