Steve Chapman
Steve Chapman

Reputation: 1317

Reading file contents using C#

I want to read the contents of following file types using C#:

  1. RTF
  2. PDF
  3. HTML
  4. MS Word

Is there any common API in .Net for reading all file type contents?

Upvotes: 3

Views: 1294

Answers (3)

Nick Josevski
Nick Josevski

Reputation: 4216

I've used Aspose before it's a very powerful product it's reasonably pricey so would only recommend it if your application also needs to create new word/pdf/rtf documents.

I agree with the other comments about just using System.IO for reading HTML files.

Upvotes: 1

D'Arcy Rittich
D'Arcy Rittich

Reputation: 171351

If you are going to full-text index the data, look into using Lucene, it can handle those file types.

Upvotes: 0

Bob
Bob

Reputation: 99684

There is no built in support for reading most of those file types. HTML is plain text so you can use the System.IO/StreamReader to read it, but you must parse it yourself.

There are third party components which will read these file types, but I am not sure if there is one all encompassing component.

For PDFs, I believe iTextSharp allows you to read.

For RTF/Word, You can use the Primary Interop Assemblies

Upvotes: 2

Related Questions