Anji R
Anji R

Reputation: 963

xls to csv using POI where excel having other language like japanese ie. UTF-8 characters

xls to csv using POI where excel having other language like japanese ie. UTF-8 characters

 // For storing data into CSV files
        StringBuffer data = new StringBuffer();
        try 
        {
        FileOutputStream fos = new FileOutputStream(outputFile);

        // Get the workbook object for XLS file
        HSSFWorkbook workbook = new HSSFWorkbook(new FileInputStream(inputFile));
        // Get first sheet from the workbook
        HSSFSheet sheet = workbook.getSheetAt(0);
        Cell cell;
        Row row;

        // Iterate through each rows from first sheet
        Iterator<Row> rowIterator = sheet.iterator();
        while (rowIterator.hasNext()) 
        {
                row = rowIterator.next();
                // For each row, iterate through each columns
                Iterator<Cell> cellIterator = row.cellIterator();
                while (cellIterator.hasNext()) 
                {
                        cell = cellIterator.next();

                        switch (cell.getCellType()) 
                        {
                        case Cell.CELL_TYPE_BOOLEAN:
                                data.append(cell.getBooleanCellValue() + ",");
                                break;

                        case Cell.CELL_TYPE_NUMERIC:
                                data.append(cell.getNumericCellValue() + ",");
                                break;

                        case Cell.CELL_TYPE_STRING:
                                data.append(cell.getStringCellValue() + ",");
                                break;

                        case Cell.CELL_TYPE_BLANK:
                                data.append("" + ",");
                                break;

                        default:
                                data.append(cell + ",");
                        }

                        data.append('\n'); 
                }
        }

        fos.write(data.toString().getBytes());
        fos.close();
        }
        catch (FileNotFoundException e) 
        {
                e.printStackTrace();
        }
        catch (IOException e) 
        {
                e.printStackTrace();
        }
        }

        public static void main(String[] args) 
        {
                File inputFile = new File("C:\test.xls");
                File outputFile = new File("C:\output.csv");
                xls(inputFile, outputFile);
        }

here the issue is i have cell values having other language charaters like テスト,再テスト1,498レッドアップルコート . when i use above code the csv file is getting create but having values as "??????"

i tried with below also

sValue = cell.getRichStringCellValue().getString();

still unable to convert those word

please help me . thanks in advance

Upvotes: 0

Views: 1483

Answers (1)

Axel Richter
Axel Richter

Reputation: 61945

OK, because of many misunderstandings about text encoding[see:note1] and because of the inability of Microsoft to provide Unicode support simply usable for CSV in Excel, let's have a concrete example:

Following Excel file:

enter image description here

and following code:

import org.apache.poi.ss.usermodel.*;

import java.io.*;
import java.nio.charset.StandardCharsets;

class ReadExcelWriteCSV {

 public static void main(String[] args) throws Exception {

  Workbook workbook = WorkbookFactory.create(new FileInputStream("inputFile.xls"));

  StringBuffer data = new StringBuffer();

  Sheet sheet = workbook.getSheetAt(0);
  for (Row row : sheet) {
   String delimiter = "";
   for (Cell cell : row) {
    data.append(delimiter);

    switch (cell.getCellTypeEnum()) {
     case STRING:
      data.append(cell.getStringCellValue());
      break;
     //case ...

     default:
      System.out.println("default cell"); //should never occur
    }

    delimiter = ",";
   }
   data.append("\r\n"); 
  }

  System.out.println(data);

  BufferedWriter bwr = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("outputFile.csv"), StandardCharsets.UTF_8));

  bwr.write('\ufeff'); //write the UTF-8 BOM to make Excel able to detect UTF-8 encoding
  bwr.write(data.toString());
  bwr.flush();
  bwr.close();

  workbook.close();

 }
}

produces a outputFile.csv having text encoding UTF-8. It also has a Byte order mark which makes it easy determinable to be UTF-8 encoded. This is necessary because else Excel cannot properly detect this encoding while simply opening the CSV file by double click.

If we are opening the resulting outputFile.csv using a text editor, it should be shown properly and the editor should be able correct detecting UTF-8 encoding.

If we are opening the resulting outputFile.csv using Text Import Wizard in Excel, then in the step 1 we also should be able correct setting UTF-8 encoding:

enter image description here

The Text Import Wizard looks a little bit different in Office 365:

enter image description here

Note: Images are from German Excel.

And because of the BOM even if simply opened in Excel by double click the file, it should be shown properly because of UTF-8 encoding detected properly.

[note1]: To only talk about one misunderstanding: You are writing about "UTF-8 characters". This is nonsense. There are Unicode characters which may be encoded using UTF-8. UTF-8 is an encoding and not a character set.

Upvotes: 0

Related Questions