Reputation: 23
I am using Mozenda (Mozenda.com) to scrape an online database, but some of the data is in PDF files. Mozenda does not appear to support scraping these files, so I am looking for another solution.
There are two questions...
What is the appropriate XPath syntax to select the URL from a link? It is not clear how to do this with Mozenda and the PDF urls are necessary to implement a 3rd party solution.
What is a good tool to convert large numbers of PDFs online into html, or better yet-scrape them?
Any helpful suggestions are most certainly appreciated. I am happy to clarify...just ask.
Upvotes: 1
Views: 641
Reputation: 626
I recognize this is a LATE answer, but Mozenda added the ability to convert PDFs to HTML and scrape from them. It's pretty easy.
Upvotes: 1
Reputation: 11
using mozenda itself you can create xpath . create any action>refine action> put . in the Xpath and take data whatever you want from CaptureDefination.
Upvotes: 0