Reputation: 131
I am using Apache PDFBox to read a PDF document that has a hierarchy defined by bookmarks. The hierarchy is in a tree form with contents only at the leaf level.
Extracting the text between two leaf level bookmarks using the following code:
Stripper.setStartBookmark(),
Stripper.setEndBookmark(),
Stripper.writeText()),
Returns text in the whole page instead. In short, my problem is similar to that mentioned in this thread.
Is there a way to extract the contents between two bookmarks?
If so, what should be the change in my code?
Upvotes: 6
Views: 2235
Reputation: 1441
I am guessing that your bookmark does not contain the correct data.
It sounds like the bookmark you are using is only pointing to the page where your content starts, rather than a location on the page.
Here is an example of a bookmark that contains location data:
<Title Action="GoTo" Style="bold" Page="2 FitH 518">
Title Name
</Title>
Upvotes: 0