Thomas
Thomas

Reputation: 34208

How to extract text data from MS-Word doc file

i am developing a resume archive where people upload their resume and that resume will be saved in a specific location. the most important things is people may use any version of MS-word to prepare their resume and resume file extension could be doc or docx. so i just like to know is there any free library available which i can use to extract text data from doc or docx file which will work in case of all ms-word version and also work if ms-word is not install in pc. i search google and found some article to extract text data from doc file but i am not sure does they work in case of all ms-word version. so please guide me with info that which library i should use to extract data from ms-word irrespective of ms-word version also give me some good article link on this issue.

also guide me is there any viewer available which i can use to show doc file content from my c# apps irrespective of ms-word version. thanks

i got the answer

**Need to add this reference Microsoft.Office.Interop.Word**

using System.Runtime.InteropServices.ComTypes;
using System.IO;

       public static string GetText(string strfilename)
    {
        string strRetval = "";
        System.Text.StringBuilder strBuilder = new System.Text.StringBuilder();
        if (File.Exists(strfilename))
        {
            try
            {
                using (StreamReader sr = File.OpenText(strfilename))
                {
                    string s = "";
                    while ((s = sr.ReadLine()) != null)
                    {
                        strBuilder.AppendLine(s);
                    }
                }
            }
            catch (Exception ex)
            {
                SendErrorMail(ex);
            }
            finally
            {
                if (System.IO.File.Exists(strfilename))
                    System.IO.File.Delete(strfilename);
            }
        }

        if (strBuilder.ToString().Trim() != "")
            strRetval = strBuilder.ToString();
        else
            strRetval = "";

        return strRetval;
    }

    public static string SaveAsText(string strfilename)
    {
        string fileName = "";
        object miss = System.Reflection.Missing.Value;
        Microsoft.Office.Interop.Word.Document doc = null;
        try
        {
            Microsoft.Office.Interop.Word.Application wordApp = new Microsoft.Office.Interop.Word.Application();
            fileName = Path.GetDirectoryName(strfilename) + @"\" + Path.GetFileNameWithoutExtension(strfilename) + ".txt";
            doc = wordApp.Documents.Open(strfilename, false);
            doc.SaveAs(fileName, Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatDOSText);

        }
        catch (Exception ex)
        {

            SendErrorMail(ex);
        }
        finally
        {
            if (doc != null)
            {
                doc.Close(ref miss, ref miss, ref miss);
                System.Runtime.InteropServices.Marshal.ReleaseComObject(doc);
                doc = null;
            }
            GC.Collect();
            GC.WaitForPendingFinalizers();
        }
        return fileName;
    }

Upvotes: 2

Views: 11623

Answers (2)

Chris
Chris

Reputation: 491

Microsoft Interop Word Nuget

            string docPath = @"C:\whereEverTheFileIs.doc";
            Application app = new Application();
            Document doc = app.Documents.Open(docPath);


            string words = doc.Content.Text;
            doc.Close();
            app.Quit();

Upvotes: 0

Related Questions