Reputation: 1
I am looking for a tool to extract text data from a document. Specifically, I would like to be able to extract metadata from invoices such as Invoice Number, Vendor Name, Invoice Date, Due Date, Amount Due, etc. Since the invoices coming in are from my vendors the metadata I listed will be located on different areas of the document. I have not been able to determine if Tika can search for a keyword in the document such as INVOICE and then extract the invoice number. I would like to be able to extract this data then push the document and metadata to a document management system such as SharePoint or Alfresco. Does anyone have experience with Tika and do you know if this is possible?
Upvotes: 0
Views: 923
Reputation: 214
Maybe late, but will be useful for other visitors to know about Algodocs, which also offers a free subscription forever: https://algodocs.com. Algodocs offers all-in-one solution, i.e. you can extract specific fields as you need or table rows from images or PDF files with hundreds of pages.
Upvotes: 0
Reputation: 3175
You can use ephesoft and alfresco.
Using ephesoft : you can extract data.
Using alfresco : you can store extracted data with document.
Its good compare to Tika.
Watch below.
https://www.youtube.com/watch?v=soV-9GGhuBg
Upvotes: 0