Christopher
Christopher

Reputation: 1

Extract Text Data from Document using Apache Tika

I am looking for a tool to extract text data from a document. Specifically, I would like to be able to extract metadata from invoices such as Invoice Number, Vendor Name, Invoice Date, Due Date, Amount Due, etc. Since the invoices coming in are from my vendors the metadata I listed will be located on different areas of the document. I have not been able to determine if Tika can search for a keyword in the document such as INVOICE and then extract the invoice number. I would like to be able to extract this data then push the document and metadata to a document management system such as SharePoint or Alfresco. Does anyone have experience with Tika and do you know if this is possible?

Upvotes: 0

Views: 923

Answers (2)

Zhavat
Zhavat

Reputation: 214

Maybe late, but will be useful for other visitors to know about Algodocs, which also offers a free subscription forever: https://algodocs.com. Algodocs offers all-in-one solution, i.e. you can extract specific fields as you need or table rows from images or PDF files with hundreds of pages.

Upvotes: 0

Krutik Jayswal
Krutik Jayswal

Reputation: 3175

You can use ephesoft and alfresco.

Using ephesoft : you can extract data.
Using alfresco : you can store extracted data with document.
Its good compare to Tika.

Watch below.
https://www.youtube.com/watch?v=soV-9GGhuBg

Upvotes: 0

Related Questions