Reputation: 963
xls to csv using POI where excel having other language like japanese ie. UTF-8 characters
// For storing data into CSV files
StringBuffer data = new StringBuffer();
try
{
FileOutputStream fos = new FileOutputStream(outputFile);
// Get the workbook object for XLS file
HSSFWorkbook workbook = new HSSFWorkbook(new FileInputStream(inputFile));
// Get first sheet from the workbook
HSSFSheet sheet = workbook.getSheetAt(0);
Cell cell;
Row row;
// Iterate through each rows from first sheet
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext())
{
row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext())
{
cell = cellIterator.next();
switch (cell.getCellType())
{
case Cell.CELL_TYPE_BOOLEAN:
data.append(cell.getBooleanCellValue() + ",");
break;
case Cell.CELL_TYPE_NUMERIC:
data.append(cell.getNumericCellValue() + ",");
break;
case Cell.CELL_TYPE_STRING:
data.append(cell.getStringCellValue() + ",");
break;
case Cell.CELL_TYPE_BLANK:
data.append("" + ",");
break;
default:
data.append(cell + ",");
}
data.append('\n');
}
}
fos.write(data.toString().getBytes());
fos.close();
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
}
public static void main(String[] args)
{
File inputFile = new File("C:\test.xls");
File outputFile = new File("C:\output.csv");
xls(inputFile, outputFile);
}
here the issue is i have cell values having other language charaters like テスト,再テスト1,498レッドアップルコート . when i use above code the csv file is getting create but having values as "??????"
i tried with below also
sValue = cell.getRichStringCellValue().getString();
still unable to convert those word
please help me . thanks in advance
Upvotes: 0
Views: 1483
Reputation: 61945
OK, because of many misunderstandings about text encoding[see:note1] and because of the inability of Microsoft
to provide Unicode
support simply usable for CSV
in Excel
, let's have a concrete example:
Following Excel
file:
and following code:
import org.apache.poi.ss.usermodel.*;
import java.io.*;
import java.nio.charset.StandardCharsets;
class ReadExcelWriteCSV {
public static void main(String[] args) throws Exception {
Workbook workbook = WorkbookFactory.create(new FileInputStream("inputFile.xls"));
StringBuffer data = new StringBuffer();
Sheet sheet = workbook.getSheetAt(0);
for (Row row : sheet) {
String delimiter = "";
for (Cell cell : row) {
data.append(delimiter);
switch (cell.getCellTypeEnum()) {
case STRING:
data.append(cell.getStringCellValue());
break;
//case ...
default:
System.out.println("default cell"); //should never occur
}
delimiter = ",";
}
data.append("\r\n");
}
System.out.println(data);
BufferedWriter bwr = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("outputFile.csv"), StandardCharsets.UTF_8));
bwr.write('\ufeff'); //write the UTF-8 BOM to make Excel able to detect UTF-8 encoding
bwr.write(data.toString());
bwr.flush();
bwr.close();
workbook.close();
}
}
produces a outputFile.csv
having text encoding UTF-8
. It also has a Byte order mark which makes it easy determinable to be UTF-8
encoded. This is necessary because else Excel
cannot properly detect this encoding while simply opening the CSV
file by double click.
If we are opening the resulting outputFile.csv
using a text editor, it should be shown properly and the editor should be able correct detecting UTF-8
encoding.
If we are opening the resulting outputFile.csv
using Text Import Wizard in Excel
, then in the step 1 we also should be able correct setting UTF-8
encoding:
The Text Import Wizard looks a little bit different in Office 365:
Note: Images are from German Excel.
And because of the BOM
even if simply opened in Excel
by double click the file, it should be shown properly because of UTF-8
encoding detected properly.
[note1]: To only talk about one misunderstanding: You are writing about "UTF-8 characters". This is nonsense. There are Unicode
characters which may be encoded using UTF-8
. UTF-8
is an encoding and not a character set.
Upvotes: 0