Shubham Jain
Shubham Jain

Reputation: 17553

Apache POI jar is not able to write a special character in excel file

I have a special character in my string which I want to write in my excel file but poi jar replacing it with ?

Special character as below string:

enter image description here

Some site throw below unicode

1)

Unicode character          Oct  Dec Hex HTML
\x{F}   si shift in ctrl-o  017 15  0xF 

2)

U+000F : <control-000F> (SHIFT IN [SI])

Example:

My string is as below

enter image description here

Excel showing output as :

PrincipalOffice~?DIRECTION

Code I am using is as something below:

String filename= "D:\\DataFiles\\"+File+"";
XSSFWorkbook hwb =new XSSFWorkbook();
XSSFSheet sheet =  hwb.createSheet("Data");
XSSFRow rowhead=   sheet.createRow((short)0);
rowhead.createCell((short) 0).setCellValue("my");
XSSFRow row=   sheet.createRow((short)i);
String value = "PrincipalOffice~DIRECTION";
row.createCell((short) 0).setCellValue(value);
FileOutputStream fileOut =  new FileOutputStream(filename);
hwb.write(fileOut);
fileOut.close();
hwb.close();

I have tried with 3.15 and 3.17 jars of apache poi Do I need to use some other libary?, if yes please suggest

Upvotes: 2

Views: 4736

Answers (2)

Shubham Jain
Shubham Jain

Reputation: 17553

Adding to Axel point.

There are many control characters. To identify them dump your data to notepad++ or any tool which will show you the control character name/detail.

Below are the link which has listed all control characters

https://ascii.cl/control-characters.htm

Now found its unicode on google like "si unicode". you will found an string like U+000F.

Now replace the string as \u000F and put in replace like _x000F_

Some examples:

U+000F <Shift In> (SI) Unicode Character - Compart

value = value.replace("\u000F", "_x000F_");

U+001C <File Separator> (FS) Unicode Character - Compart

value = value.replace("\u001C", "_x001C_");

U+001E <Record Separator> (RS) Unicode Character - Compart

value = value.replace("\u001E", "_x001E_");

Upvotes: 1

Axel Richter
Axel Richter

Reputation: 61870

The character 0x0F or &#15;, you are mentioned, is not able to store directly in XML since it is a control character. So since *.xlsx files stores content using XML, that character cannot be stored directly.

Nevertheless does Microsoft defining: ECMA-376 Part 1 22.4 Variant Types 22.4.2.4 bstr (Basic String):

22.4.2.4 bstr (Basic String)

This element defines a binary basic string variant type, which can store any valid Unicode character. Unicode characters that cannot be directly represented in XML as defined by the XML 1.0 specification, shall be escaped using the Unicode numerical character representation escape character format _xHHHH_, where H represents a hexadecimal character in the character's value. [Example: The Unicode character 8 is not permitted in an XML 1.0 document, so it shall be escaped as _x0008_. end example] To store the literal form of an escape sequence, the initial underscore shall itself be escaped (i.e. stored as _x005F_). [Example: The string literal _x0008_ would be stored as _x005F_x0008_. end example]

The possible values for this element are defined by the W3C XML Schema string datatype.

This extends the W3C XML Schema string datatype. So that the character sequence _xHHHH_ does have a special meaning as a kind of entity like &#xHHHH;.

So the following will work and Excel will have the character 0x0F as cell content.

import java.io.FileOutputStream;

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class CreateExcelControlCharacter {

 public static void main(String[] args) throws Exception {

  Workbook wb = new XSSFWorkbook();

  Sheet sheet = wb.createSheet();
  Row row = sheet.createRow(0);
  Cell cell = row.createCell(0);

  String value = "PrincipalOffice\u000FDIRECTION";

  value = value.replace("\u000F", "_x000F_");

  cell.setCellValue(value);

  wb.write(new FileOutputStream("CreateExcelControlCharacter.xlsx"));
  wb.close();
 }

}

But may I ask why you are needing this control character to be a Excel cell content? That could be the answer to my question Useful use cases for escape character format _xHHHH_ in Office Open XML?.

Upvotes: 5

Related Questions