elouanesbg
elouanesbg

Reputation: 66

apache solr for translated documents indexing

does Apache solr allow this:

The possibility of returning to the user, in addition to the document translated into French, the original text as well as the contexts of use in the original text?

the documents to be indexed is a pdf files.

ُEdit: Add example

i have the original document doc_eng.pdf and the translated document doc_fr.pdf

when the doc_fr.pdf is return in a query response i want to be able to to get doc_eng.pdf also with the context (highlighting) if it is possible

My suggestion

1- map doc_fr.pdf and doc_eng.pdf to the same id (if this can be done) and add a boolean field isOriginal =true|false .

2- use nested documents (but i dont get how this will work with pdf files)

Upvotes: 1

Views: 230

Answers (1)

Gibbs
Gibbs

Reputation: 22956

Yes, solr can do this. I would suggest you to use apache tika mechanism

Solr can identify languages and map text to language-specific fields during indexing using the langid UpdateRequestProcessor.

Solr supports two implementations of this feature:

Tika’s language detection feature

[LangDetect language detection](https://github.com/shuyo/language-detection https://lucene.apache.org/solr/guide/7_2/language-analysis.html)

Refer

Translator

Upvotes: 1

Related Questions