user1999722
user1999722

Reputation: 181

Reading doc and docx files using C# without having MS Office installed on server

I'm working on a project (asp.net, c#, vb 2010, .net 4) and I need to read both DOC and DOCX files, that I've previosly uploaded (I've done uploading part). Tricky part is that I don't have MS Office installed on server and that I can't use it.

Is there any public library that I can include into my project without having to install anything? Both docs are very simple:

NUMBER TAB STRING  
NUMBER TAB STRING  
NUMBER TAB STRING  
...  

I need to extract number and string for each row (paragraph).

May someone help with this? I should repeat once again that I'm limited in a way that I can't install anything on a server.

Upvotes: 9

Views: 25051

Answers (4)

Jaydeep Solanki
Jaydeep Solanki

Reputation: 1

You can do like this:

using System.IO;
using System.Text;
using Spire.Doc;
    
namespace ReadTextLineByLine{
    class Program {
        static void Main(string[] args) {
            //Create a Document object
            Document doc = new Document();
            //Load a Word file
            doc.LoadFromFile(@"C:\Users\Administrator\Desktop\data.docx");
            //Convert the text in Word line by line into a txt file
            doc.SaveToTxt("result.text", Encoding.UTF8);
            //Read all lines of txt file
            string[] lines = File.ReadAllLines("result.text", System.Text.Encoding.Default);
        }
    }
}

Upvotes: -1

Pavel Kudinov
Pavel Kudinov

Reputation: 405

We can now use open source, NPOI (.NET port of Apache POI) library which also supports docx, xls & xlsx. DocX is also another open source library for creating word docs.

For DOCX I'd suggest Open XML API, though Microsoft developed Open XML to create office files through the XML files communicating with this API, the latest version 2.5 was released in 2013 which is 5 years ago.

Upvotes: 5

Sagar Modi
Sagar Modi

Reputation: 41

you can use Code7248.word_reader.dll

below is the sample code on how to use Code7248.word_reader.dll

add reference to this DLL in your project and copy below code.

using System;
using System.Collections.Generic;
using System.Text;
//add extra namespaces
using Code7248.word_reader;


namespace testWordRead
{
    class Program
    {
        private void readFileContent(string path)
        {
            TextExtractor extractor = new TextExtractor(path);
            string text = extractor.ExtractText();
            Console.WriteLine(text);
        }
        static void Main(string[] args)
        {
            Program cs = new Program();
            string path = "D:\Test\testdoc1.docx";
            cs.readFileContent(path);
            Console.ReadLine();
        }
    }
}

Upvotes: 2

Tony Qu
Tony Qu

Reputation: 776

Update: NPOI supports docx now. Please try the latest release (NPOI 2.0 beta)

Upvotes: 1

Related Questions