Muflix
Muflix

Reputation: 6768

OpenXml.Spreadsheet class throws FileFormatException error on simple excel file

I need to read 50MB .XLS excel but im getting this error:

System.IO.FileFormatException {"File contains corrupted data."}

I choose OpenXMLReader class for reading, because it seems to fit on this scenario. But it crash on the first line of code.

SpreadsheetDocument.Open(filename, false)

The Excel file can be downloaded here: http://www.filedropper.com/saxfile its simple 97-2003 excel test file with 10 rows

    static void Main(string[] args)
        {
            ReadExcelFileSAX(@"C:\Users\User\Desktop\SAXFile.xls");
        }

    static void ReadExcelFileSAX(string filename)
    {
        using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open(filename, false))
        {
            WorkbookPart workbookPart = myDoc.WorkbookPart;
            WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();

            OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
            string text;
            while (reader.Read())
            {
                if (reader.ElementType == typeof(CellValue))
                {
                    text = reader.GetText();
                    Console.WriteLine(text);
                }
            }
        }
    }

What am i missing ?

Upvotes: 1

Views: 6956

Answers (3)

Luis Miguel Viana
Luis Miguel Viana

Reputation: 11

Use ExcelDataReader for xls and xlsx.

public static DataTable XLSFileStreamReader(string filePath)
{

    try
    {

        FileStream stream = new FileStream(filePath, FileMode.Open);

        //IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);

        IExcelDataReader excelReader = null;
        excelReader = ExcelReaderFactory.CreateReader(stream);

        DataSet result = excelReader.AsDataSet();

        excelReader.Close();

        return result.Tables[0];

    }
    catch (Exception e)
    {
        
        throw;
    }

}

Upvotes: 0

Muflix
Muflix

Reputation: 6768

This code before using OpenXML will help (it save xls as xlsx), but its little bit workaround

    static void ConvertXlsToXlsx(string xlsFilePath, string xlsxFilePath)
    {
        Microsoft.Office.Interop.Excel.Application excelApp = new Microsoft.Office.Interop.Excel.Application();
        excelApp.Visible = false;

        Microsoft.Office.Interop.Excel.Workbook eWorkbook = excelApp.Workbooks.Open(xlsFilePath, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
        eWorkbook.SaveAs(xlsxFilePath, Microsoft.Office.Interop.Excel.XlFileFormat.xlOpenXMLWorkbook, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlNoChange, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
        eWorkbook.Close(false, Type.Missing, Type.Missing);
    }

Better solution is maybe to use different library ExcelDataReader https://exceldatareader.codeplex.com/ which i tested and work (example print .CSV to Console)

    static void XLSFileStreamReader(string filePath)
    {
        FileStream stream = new FileStream(filePath, FileMode.Open);

        // Reading from a binary Excel file ('97-2003 format; *.xls)
        IExcelDataReader excelReader2003 = ExcelReaderFactory.CreateBinaryReader(stream);

        // DataSet - The result of each spreadsheet will be created in the result.Tables
        DataSet result = excelReader2003.AsDataSet();

        // Data Reader methods
        foreach (DataTable table in result.Tables)
        {
            for (int i = 0; i < table.Rows.Count; i++)
            {
                for (int j = 0; j < table.Columns.Count; j++)
                    Console.Write("\"" + table.Rows[i].ItemArray[j] + "\";");
                Console.WriteLine();
            }
        }

        // Free resources (IExcelDataReader is IDisposable)
        excelReader2003.Close();
    }

i found it here https://gist.github.com/Munawwar/924413

PS: FileStream maybe should be in USING block.

Upvotes: 0

scanny
scanny

Reputation: 28863

The file you're trying to open is a pre-OpenXML binary format Excel file.

I'm sure the reader raises an error as soon as it detects it is not a valid ZIP archive.

Upvotes: 4

Related Questions