Reputation: 21361
In my code I am going through an XLSX-file row by row, validating them against a database with Apache POI 4.1.0. If I find a incorrect row I will "mark" them for deletion by adding it to the the List<XSSFRow> toRemove
. After iterating over every row this small method is supposed to remove the rows marked for deletion:
ListIterator<XSSFRow> rowIterator = toRemove.listIterator(toRemove.size());
while (rowIterator.hasPrevious()) {
XSSFRow row = rowIterator.previous();
if (row != null && row.getSheet() == sheet) {
int lastRowNum = sheet.getLastRowNum();
int rowIndex = row.getRowNum();
if (rowIndex == lastRowNum) {
sheet.removeRow(row);
} else if (rowIndex >= 0 && rowIndex < lastRowNum) {
sheet.removeRow(row);
} else {
System.out.println("\u001B[31mERROR: Removal failed because row " + rowIndex + " is out of bounds\u001B[0m");
}
System.out.println("Row " + rowIndex + " successfully removed");
} else {
System.out.println("Row skipped in removal because it was null already");
}
}
But for some unknown reason it removes all rows perfectly and then throws a XmlValueDisconnectedException when getting the row index (getRowNum()
) of the last (first added) row.
Relevant part of the Stacktrace:
org.apache.xmlbeans.impl.values.XmlValueDisconnectedException
at org.apache.xmlbeans.impl.values.XmlObjectBase.check_orphaned(XmlObjectBase.java:1258)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTRowImpl.getR(Unknown Source)
at org.apache.poi.xssf.usermodel.XSSFRow.getRowNum(XSSFRow.java:400)
at Overview.removeRows(Overview.java:122)
EDIT: I also tried changing the iteration process (see below) but the error stays the same.
for (XSSFRow row : toRemove) {
// same code as above without iterator and while
}
Upvotes: 0
Views: 3195
Reputation: 61860
The error occurs if one row is double contained in List toRemove
. A List
allows duplicate entries. So the same row may be double added to the List
. If then Iterator
gets the first occurrence of that row and this will be removed properly from the sheet. But then if the same row occurs again later, the row.getRowNum()
fails that way because the row does not more exists in the sheet.
Here is complete code to reproduce that behavior:
import org.apache.poi.ss.usermodel.*;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.*;
public class ExcelRemoveRows {
public static void main(String[] args) throws Exception {
String filePath = "Excel.xlsx"; // must contain at least 5 filled rows
Workbook workbook = WorkbookFactory.create(new FileInputStream(filePath));
Sheet sheet = workbook.getSheetAt(0);
List<Row> toRemoveList = new ArrayList<Row>();
toRemoveList.add(sheet.getRow(0));
toRemoveList.add(sheet.getRow(2));
toRemoveList.add(sheet.getRow(4));
toRemoveList.add(sheet.getRow(2)); // this produces the error
System.out.println(toRemoveList); // contains row hawing index 2 (r="3") two times
for (Row row : toRemoveList) {
System.out.println(row.getRowNum()); // XmlValueDisconnectedException on second occurance of row index 2
sheet.removeRow(row);
}
FileOutputStream out = new FileOutputStream("Changed"+filePath);
workbook.write(out);
out.close();
workbook.close();
}
}
The solution is to avoid that the List
contains the same row multiple times.
I would not collecting the rows to remove in a List<XSSFRow>
but the row numbers to remove in a Set<Integer>
. That would avoid duplicates since a Set
does not allow duplicate elements. The row to remove then can simply got via sheet.getRow(rowNum)
.
Code:
...
Set<Integer> toRemoveSet = new HashSet<Integer>();
toRemoveSet.add(sheet.getRow(0).getRowNum());
toRemoveSet.add(sheet.getRow(2).getRowNum());
toRemoveSet.add(sheet.getRow(4).getRowNum());
toRemoveSet.add(sheet.getRow(2).getRowNum());
System.out.println(toRemoveSet); // does not contain the row index 2 two times
for (Integer rowNum : toRemoveSet) {
Row row = sheet.getRow(rowNum);
System.out.println(row.getRowNum());
sheet.removeRow(row);
}
...
Upvotes: 1