Michael Richardson
Michael Richardson

Reputation: 4282

How do I insert Excel cells without creating a corrupt file?

I'm using the OpenXML SDK to update the contents of an Excel spreadsheet. When inserting cells into an Excel row they must be inserted in the correct order or the file will not open properly in Excel. I'm using the following code to find the first cell that will be after the cell I am inserting. This code comes almost directly from the OpenXML SDK documentation

public static Cell GetFirstFollowingCell(Row row, string newCellReference)
{
    Cell refCell = null;
    foreach (Cell cell in row.Elements<Cell>())
    {
        if (string.Compare(cell.CellReference.Value, newCellReference, true) > 0)
        {
            refCell = cell;
            break;
        }
    }

    return refCell;
}

When I edit files with this code and then open them in Excel, Excel reports that the file is corrupted. Excel is able to repair the file, but most of the data is removed from the workbook. Why does this result in file corruption?

Side note: I tried two different .NET Excel libraries before turning to the painfully low-level OpenXML SDK. NPOI created spreadsheets with corruption and EPPlus threw an exception whenever I tried to save. I was using the most recent version of each.

Upvotes: 2

Views: 997

Answers (1)

Michael Richardson
Michael Richardson

Reputation: 4282

The code you are using is seriously flawed. This is very unfortunate, seeing as it comes from the documentation. It may work acceptably for spreadsheets that only use the first 26 columns but will fail miserably when confronted with "wider" spreadsheets. The first 26 columns are named alphabetically, A-Z. Columns 27-52 are named AA-AZ. Column 53-78 are named BA-BZ. (You should notice the pattern.)

Cell "AA1" should come after all cells with a single character column name (i.e. "A1" - "Z1"). Let's examine the current code comparing cell "AA1" with cell "B1".

  1. string.Compare("B1", "AA1", true) returns the value 1
  2. The code interprets this to mean that "AA1" should be placed before cell "B1".
  3. The calling code will insert "AA1" before "B1" in the XML.

At this point the cells will be out of order and the Excel file is corrupted. Clearly, string.Compare by itself is not a sufficient test to determine the proper order of cells in a row. A more sophisticated comparison is required.

public static bool IsNewCellAfterCurrentCell(string currentCellReference, string newCellReference)
{
    var columnNameRegex = new Regex("[A-Za-z]+");
    var currentCellColumn = columnNameRegex.Match(currentCellReference).Value;
    var newCellColumn = columnNameRegex.Match(newCellReference).Value;
    var currentCellColumnLength = currentCellColumn.Length;
    var newCellColumnLength = newCellColumn.Length;
    if (currentCellColumnLength == newCellColumnLength)
    {
        var comparisonValue = string.Compare(currentCellColumn, newCellColumn, StringComparison.OrdinalIgnoreCase);
        return comparisonValue > 0;
    }

    return currentCellColumnLength < newCellColumnLength;
}

If you wanted to place a new cell in column "BC" and you were comparing to cell "D5" you would use IsCellAfterColumn("D5", "BC5"). Substituting the new comparison function into the original code and simplifying with LINQ:

public static Cell GetFirstFollowingCell(Row row, string newCellReference)
{
    var rowCells = row.Elements<Cell>();
    return rowCells.FirstOrDefault(c => IsNewCellAfterCurrentCell(c.CellReference.Value, newCellReference));
}

Upvotes: 1

Related Questions