Text Extraction library from different file types, PDF ,DOC, DOCX, TXT c#

Question

I'm Building Information Retrieval System that search text in multi files formats, I have Tried EPocalipse IFilter Lirary but it through an exception when trying to read docx files, and I tried Toxy Library it though an exception for doc arabic files, finally I tried TikaOnDotNet Libray but it need java to work and I need to put the system online on hosting that don't have java installed on server

user6522773 · Accepted Answer

What about using such libraries :

For DOC/DOCX: http://www.dotnetperls.com/word

For PDF: https://github.com/itext/itextsharp

For TXT: https://msdn.microsoft.com/en-us/library/ms143368(v=vs.110).aspx

Text Extraction library from different file types, PDF ,DOC, DOCX, TXT c#

Answers (2)

Related Questions