Reputation: 681
I want to create a Lucene analyzer for RDF nodes. RDF nodes can have multiple types (uri, bnode, plain literal, plain literal with language, typed literal with datatype). While analyzing the term, I want to create a RDFNodeTypeAttribute, LanguageAttribute and DatatypeAttribute to store respectively the type of RDF node, the language of the literal and the datatype attribute. My question is how these attributes can be stored in lucene index. Do I have to write a custom Codecs ? Do I have to use the PayloadAttribute ? How can I leverage these attributes once stored in the index for my search ? Thank you for your help
Upvotes: 1
Views: 529
Reputation: 314
I could not exactly get your requirements but you would use Codecs if you are not happy with the way a Lucene index is encoded and decoded. Codecs gives you flexibility to have your own PostingsFormat, SegmentInfosFormat, LiveDocsFormat etc. So let us say, you want a different postingsFormat from the default Lucence codec - which is more like for every term, store all docIds it occurs in, how many times it occurs in a doc, at what position etc in a particular format. If you want this information to be stored in a different format, you would need a codec.
I do not think you need to write any Codec or any PostingFormat for this. Perhaps writing your own Analyzer and Similarity classes should be sufficient. If you give more information about your problem, I can think further.
Payload is at term level and typical use case is to store meta data for every term. So, a use case like: this term is written in Bold,or is a noun etc are meta data for the term and should be stored in a payload. You actually use payloads for scoring of the docs and they matter in giving a term some weight.
Though RDF is a metadata for a web resource, you are probably talking about indexing RDF itself. Even if it is part of the web document, you are indexing, putting the RDF info for every term in the web document will not be a viable approach, as there are better ways to allocate weights to a document than that.
Upvotes: 1