Reputation: 66
does Apache solr allow this:
The possibility of returning to the user, in addition to the document translated into French, the original text as well as the contexts of use in the original text?
the documents to be indexed is a pdf files.
ُEdit: Add example
i have the original document doc_eng.pdf
and the translated document doc_fr.pdf
when the doc_fr.pdf
is return in a query response i want to be able to to get doc_eng.pdf
also with the context (highlighting) if it is possible
My suggestion
1- map doc_fr.pdf
and doc_eng.pdf
to the same id (if this can be done) and add a boolean field isOriginal =true|false .
2- use nested documents (but i dont get how this will work with pdf files)
Upvotes: 1
Views: 230
Reputation: 22956
Yes, solr can do this. I would suggest you to use apache tika mechanism
Solr can identify languages and map text to language-specific fields during indexing using the langid UpdateRequestProcessor.
Solr supports two implementations of this feature:
[LangDetect language detection](https://github.com/shuyo/language-detection https://lucene.apache.org/solr/guide/7_2/language-analysis.html)
Upvotes: 1