Rays
Rays

Reputation: 41

Lucene 6.2.1 How to get all field names or search across all fields without knowing their names

I'm new in Lucene and I would like to know if there is a way to search through all possible fields in multiple documents without knowing their names or... another approach: to get all field names (version 6.2.1)

  1. For instance: How to get all names from 'fields' array and not to fill them like in example below

    Analyzer analyzer = new StandardAnalyzer();
    String querystr = "test";
    String[] fields = {"title","isbn","desc", "name", "surname", "description"};
    BooleanClause.Occur[] flags = new BooleanClause.Occur[fields.length];
    Arrays.fill(flags, BooleanClause.Occur.SHOULD);
    Query query = MultiFieldQueryParser.parse(querystr, fields, flags, analyzer);
    

    I have already checked those topics:

    a) How to search across all the fields?

    We have implemented this answer:

    1) Index-time approach: Use a catch-all field. This is nothing but appending all the text from all the fields (total text from your input doc) and place that resulting huge text in a single field. You've to add an additional field while indexing to act as a catch-all field.

    but we would like to change it if there is possibility

    b) https://www.programcreek.com/java-api-examples/index.php?api=org.apache.lucene.queryParser.MultiFieldQueryParser

    c) IndexReader.getFieldNames Lucene 4

    but those solutions are not present in Lucene version 6.2.1

    IndexReader.getFieldNames() (v. 3.3.0)

    final AtomicReader reader = searcher.getAtomicReader();

    final FieldInfos infos = reader.getFieldInfos(); (v. 4.2.1)

  2. ...or is there a method (not necessarily MultiFieldQueryParser) which provides search through all fields without their names (v. 6.2.1)?

Upvotes: 4

Views: 4086

Answers (2)

femtoRgon
femtoRgon

Reputation: 33341

If you have already implemented the solution of putting all the text you wish to search into one catch-all field, why do you want to change it. If you want to change it because it seems like a hack, let me assure you, that is the correct, best solution to this problem. That is a pattern recommended in the documentation of both Solr and ElasticSearch.

Generating a list of fields and creating a big, complicated query against all of them is the hack. You should definitely stick with the solution you have already implemented.


If you are one of the poor, unfortunate souls that just can't reindex to add a new field with all the stuff you need to search, and you really need a way to get a list of all the fields and query against them, here you go. You can get the list of fields in a LeafReader simply enough, and a DirectoryReader (from DirectoryReader.open, for ex) contains a list of LeafReaderContexts. So iterate through the LeafReaders, and get and merge the list of fields from each, to get a full list of fields in the index:

DirectoryReader reader = DirectoryReader.open(Paths.get('/path/to/my/index'));
HashSet<String> fieldnames = new HashSet<String>();
for (LeafReaderContext subReader : reader.leaves) {
    Fields fields = subReader.reader().fields();
    for (String fieldname : fields) {
        fieldnames.add(fieldname);
    }
}

You could do that on application start, or when you reopen your reader, rather than every time you query. Now you have the list of field names that you could pass into MultiFieldQueryParser, or to chuck a bunch of TermQueries into a BooleanQuery or a DisjunctionMaxQuery, or some such.

Upvotes: 3

dom
dom

Reputation: 763

Based on your question i suggest you just wanna search for some terms and the fields in which this values actually are indexed, aren't really important to know?

In this case the best approach would implementing a normal fulltext search based on the structure how elasticsearch or solr are able to handle this:

  • Add a dedicated "fulltext" TextField to each document (TextField is used for fulltext searches)
  • fill fulltext field with all information of the other fields, separated with a space
  • Search with your term based on your fulltext f

This is how fulltext search can be implemented in a easy way. There is no need to know the field names and iterate over those.

Upvotes: 1

Related Questions