DavidVdd
DavidVdd

Reputation: 1018

how to index each page of a pdfdocument as a seperate Solr document

I'm trying to retrieve page numbers from where a search result was found in solr. I have found that indexing each page as a seperate solr document would work. But I can't seem to find a way to index a single page from a pdf file.

Has anyone found a way to index a single page of a document with solr?

Upvotes: 0

Views: 113

Answers (1)

Persimmonium
Persimmonium

Reputation: 15789

you can use any library for example pdfbox in order to extract text from each page separately and submit distinct documents to Solr

Upvotes: 1

Related Questions