Reputation: 21
I'm a junior software engineer. Currently, I am struggling to convert an excel file from xls to xlsx without losing formats/formulas. I need your advise. I have been trying a few solutions using POI, but the table format is messed up. Please advise.
I have tried a few solutions online, even reading the POI documentation, but it did not help.
public class conversion_to_xlsx {
public static void main(String[] args) throws InvalidFormatException,
IOException {
String inpFn = "C:\\Users\\user\\JavaProjects\\workspace\\Sample2\\src\\excelFileGenerate6\\report.xls";
String outFn = "C:\\Users\\user\\JavaProjects\\workspace\\Sample2\\src\\excelFileGenerate6\\converted_report.xlsx";
FileInputStream in = new FileInputStream(inpFn);
try {
Workbook wbIn = new HSSFWorkbook(in);
File outF = new File(outFn);
if (outF.exists())
outF.delete();
Workbook wbOut = new XSSFWorkbook();
int sheetCnt = wbIn.getNumberOfSheets();
for (int i = 0; i < sheetCnt; i++) {
Sheet sIn = wbIn.getSheetAt(i);
//Sheet sOut = wbOut.getSheet(null);
Sheet sOut = wbOut.createSheet(sIn.getSheetName());
Iterator<Row> rowIt = sIn.rowIterator();
while (rowIt.hasNext()) {
Row rowIn = rowIt.next();
Row rowOut = sOut.createRow(rowIn.getRowNum());
Iterator<Cell> cellIt = rowIn.cellIterator();
while (cellIt.hasNext()) {
Cell cellIn = cellIt.next();
Cell cellOut = rowOut.createCell(
cellIn.getColumnIndex(), cellIn.getCellType());
switch (cellIn.getCellType()) {
case Cell.CELL_TYPE_BLANK:
break;
case Cell.CELL_TYPE_BOOLEAN:
cellOut.setCellValue(cellIn.getBooleanCellValue());
break;
case Cell.CELL_TYPE_ERROR:
cellOut.setCellValue(cellIn.getErrorCellValue());
break;
case Cell.CELL_TYPE_FORMULA:
cellOut.setCellFormula(cellIn.getCellFormula());
break;
case Cell.CELL_TYPE_NUMERIC:
cellOut.setCellValue(cellIn.getNumericCellValue());
break;
case Cell.CELL_TYPE_STRING:
cellOut.setCellValue(cellIn.getStringCellValue());
break;
}
{
CellStyle styleIn = cellIn.getCellStyle();
CellStyle styleOut = cellOut.getCellStyle();
styleOut.setDataFormat(styleIn.getDataFormat());
}
cellOut.setCellComment(cellIn.getCellComment());
// HSSFCellStyle cannot be cast to XSSFCellStyle
// cellOut.setCellStyle(cellIn.getCellStyle());
}
}
}
FileOutputStream out = new FileOutputStream(outF);
try {
wbOut.write(out);
} finally {
out.close();
}
} finally {
in.close();
}
}
}
Upvotes: 1
Views: 358
Reputation: 670
I haven't done this conversion before either. However, if you have charts and objects in the Excel, then the manual coding of a conversion tool (like POI) may result a complicated endeavour which may challenge by time.
Have you tried using the Excel bundle-in tool?
C:\Program Files\Microsoft Office\root\OfficeXX\excelcnv.exe
You should be able to do something like
excelcnv.exe -oice "input.xls" "output.xlsx"
And if you have many files to convert, then you can do via a batch file:
@echo off
setlocal
set "inputDir=C:\path\to\your\input\directory"
set "outputDir=C:\path\to\your\output\directory"
set "converter=C:\Program Files\Microsoft Office\root\OfficeXX\excelcnv.exe"
for %%f in ("%inputDir%\*.xls") do (
set "filename=%%~nf"
%converter% -oice "%%f" "%outputDir%\%filename%.xlsx"
)
endlocal
echo Conversion completed!
pause
NOTE:
Upvotes: 3
Reputation: 61870
Converting excel file from xls to xlsx retaining all formatting and functionality - is it possible using Apache POI?
What Apache POI is
The objective of Apache POI is to provide a Java API to read and write Microsoft Office files without the need of installed Microsoft Office applications. It provides access to binary file formats of Microsoft Office up to 2007, binary *.xls
-Excel-files for example, as well as to Office Open XML file formats of Microsoft Office 2007 or higher, *.xlsx
-Excel-files for example. Both file formats are currently open documented and therefore understandable without the need for reverse engineering.
Apache POI is a mature project. But nevertheless it is far away from providing access to all the features of both file formats, simply because of the complexity and sheer amount of functionality Microsoft had put into it's Office applications since decades.
What Apache POI not is
Apache POI does not provide all the functionality which Microsoft Office provides in it's application GUIs. The code of the Microsoft Office applications is closed source and Apache POI's objective is not to reverse engineer this code. There are exceptions. For example, some (one?) of the Apache POI developers decided to program a renderer for PowerPoint slides to be able to export slides to image (picture) files. And there is a formula evaluator for Excel formulas, simply because this is necessary to be able to re-calculate formulas when attribute values where changed. But all of those exceptions use code which differs from the code which Microsoft uses and therefore might lead to different results too.
In short: Apache POI is able to read data out of Microsoft Office files and to write data into such files. But it is far away from providing access to all the features of the file formats. And it will not "act as a Microsoft Office application" in any way.
So the answer to the question in first sentence of this answer is: No, converting excel file from xls to xlsx retaining all formatting and functionality is not possible using Apache POI. At least not now. And I doubt that this will be possible in the future, as it would mean that Apache POI would have to catch up with Microsoft for decades.
Converting excel file from xls to xlsx retaining all formatting and functionality - is it possible using Java programming?
The only application which can do this really properly is Microsoft Excel itself. Therefore, if there would be a Java API which provides access to a Microsoft Excel application to remote control this application, or at least was able to use Microsoft's COM interface to Microsoft Excel application, then it would be possible.
LibreOffice and/or OpenOffice are able to work with Microsoft Office files much like Microsoft Office. They provide not all features but most. And they offer programming APIs, also for Java. So using LibreOffice and/or OpenOffice to solve the task would be an option too.
@MaduKan refers to excelcnv.exe
which Microsoft provides in its Office installations. This could be used using Java programming with ProcessBuilder
. See example:
import java.io.*;
class ExcelcnvProcessBuilder {
static void convertXlsToXlsx(String excelcnv, File fSource, File fTarget) {
try {
ProcessBuilder pb = new ProcessBuilder(excelcnv, "-oice", "\"" + fSource.getAbsolutePath() + "\" \"" + fTarget.getAbsolutePath() + "\"");
Process process = pb.start();
} catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
String excelcnv = "\"C:\\Program Files\\Microsoft Office\\root\\Office16\\excelcnv.exe\"";
File dSource = new File("C:/java/source");
File[] directoryListing = dSource.listFiles(new FileFilter() {
public boolean accept(File file) {
if (file.getName().toLowerCase().endsWith(".xls")) {
return true;
}
return false;
}
});
if (directoryListing != null) {
for (File fSource : directoryListing) {
File fTarget = new File("C:/java/target/" + fSource.getName() + "x");
System.out.println(fSource.getAbsolutePath());
System.out.println(fTarget.getAbsolutePath());
convertXlsToXlsx(excelcnv, fSource, fTarget);
}
}
}
}
The disadvantages should be clear:
excelcnv.exe
. Code explicitly needs to be adopted to different installed Microsoft Office versions.Upvotes: 2