Alaa M. Tekleh
Alaa M. Tekleh

Reputation: 7926

Text Extraction library from different file types, PDF ,DOC, DOCX, TXT c#

I'm Building Information Retrieval System that search text in multi files formats, I have Tried EPocalipse IFilter Lirary but it through an exception when trying to read docx files, and I tried Toxy Library it though an exception for doc arabic files, finally I tried TikaOnDotNet Libray but it need java to work and I need to put the system online on hosting that don't have java installed on server

Upvotes: 5

Views: 1986

Answers (2)

Debasis
Debasis

Reputation: 3750

A library which is able to extract all textual data from any type of files is the Apache Tika library. It can even extract the metadata (if any) from non-text files such as image and video files. Example use cases are shown here.

Upvotes: 2

Related Questions