keshav
keshav

Reputation: 31

Delete multiple rows in Excel file using Apache POI

I have very little knowledge of Apache POI. Is there any optimized way to delete the Excel rows quickly using Apache POI and Java?

It is taking more than two hours to remove the 11000 rows and is it possible to delete multiple rows at a time?

I have used the following code:

for (int s = rowNum; s < 11000; s++) {
    Row r = sheet.getRow(s);
    if(r != null) {
        sheet.removeRow(r);
    }
}

Upvotes: 1

Views: 1317

Answers (1)

dodobird
dodobird

Reputation: 320

Unfortunately, as of apache-poi-5.0.0 it cannot be done any faster.

The problem of removing a row from an XSSFSheet may look linear but is actually quadratic.

So when you try to remove mutliple XSSFRows from an XSSFSheet, you are also removing the XSSFCells associated with it. Thus, more the number of XSSFCells the slower it gets.

Code from XSSFSheet:

/**
  * Remove a row from this sheet.  All cells contained in the row are removed as well
  *
  * @param row  the row to remove.
*/
@Override
public void removeRow(Row row) {
  if (row.getSheet() != this) {
         throw new IllegalArgumentException("Specified row does not belong to this sheet");
  }
  // collect cells into a temporary array to avoid ConcurrentModificationException
  ArrayList<XSSFCell> cellsToDelete = new ArrayList<>();
  for (Cell cell : row) {
      cellsToDelete.add((XSSFCell)cell);
  }

  for (XSSFCell cell : cellsToDelete) {
       row.removeCell(cell);
  }

  final int rowNum = row.getRowNum();
  // Performance optimization: explicit boxing is slightly faster than auto-unboxing, though may use more memory
  //noinspection UnnecessaryBoxing
  final Integer rowNumI = Integer.valueOf(rowNum); // NOSONAR
  // this is not the physical row number!
  final int idx = _rows.headMap(rowNumI).size();
  _rows.remove(rowNumI);
  worksheet.getSheetData().removeRow(idx);

  // also remove any comment located in that row
  if(sheetComments != null) {
     for (CellAddress ref : getCellComments().keySet()) {
          if (ref.getRow() == rowNum) {
              sheetComments.removeComment(ref);
          }
      }
   }
}

My suggestion would be to copy the XSSFRows from the existing XSSFSheet to a new XSSFSheet and then remove the previous XSSFSheet. Use this iff, number of XSSFRows to be deleted is greater than the number of XSSFRows to be copied.

private void copyRowsAndDeleteSheet(XSSFWorkbook workbook) {
    XSSFSheet srcSheet = workbook.createSheet();
    // Inserting dummy data 
    for (int i = 0; i < 11000; i++) {
        XSSFRow row = srcSheet.createRow(i);
        row.createCell(0).setCellType(CellType.STRING);
        row.getCell(0).setCellValue("Hello" + i);
    }
    
    // Sheet to be copied to
    XSSFSheet destSheet = workbook.createSheet();
    // Defines how the cell should be copied
    CellCopyPolicy policy = new CellCopyPolicy().createBuilder().cellStyle(true).build();
    for (int i = 0; i < 100; i++) {
        // Row to be copied
        XSSFRow srcRow = srcSheet.getRow(i);
        // Inserting into a i+1 instead of i to avoid IllegalArgumentException throw by FormulaShifter
        destSheet.createRow(i + 1).copyRowFrom(srcRow, policy);
    }
    
    // Shifting the rows up by 1 row to match source sheet
    destSheet.shiftRows(1, 100, -1);
    
    // removing the source sheet
    workbook.removeSheetAt(workbook.getSheetIndex(srcSheet));
}

Note: copyRowFrom() is still beta so it would be wise to not use it in a production environment.

Upvotes: 2

Related Questions