Reputation: 17553
I have a special character in my string which I want to write in my excel file but poi jar replacing it with ?
Special character as below string:
Some site throw below unicode
1)
Unicode character Oct Dec Hex HTML
\x{F} si shift in ctrl-o 017 15 0xF 
2)
U+000F : <control-000F> (SHIFT IN [SI])
Example:
My string is as below
Excel showing output as :
PrincipalOffice~?DIRECTION
Code I am using is as something below:
String filename= "D:\\DataFiles\\"+File+"";
XSSFWorkbook hwb =new XSSFWorkbook();
XSSFSheet sheet = hwb.createSheet("Data");
XSSFRow rowhead= sheet.createRow((short)0);
rowhead.createCell((short) 0).setCellValue("my");
XSSFRow row= sheet.createRow((short)i);
String value = "PrincipalOffice~DIRECTION";
row.createCell((short) 0).setCellValue(value);
FileOutputStream fileOut = new FileOutputStream(filename);
hwb.write(fileOut);
fileOut.close();
hwb.close();
I have tried with 3.15 and 3.17 jars of apache poi Do I need to use some other libary?, if yes please suggest
Upvotes: 2
Views: 4736
Reputation: 17553
Adding to Axel point.
There are many control characters. To identify them dump your data to notepad++ or any tool which will show you the control character name/detail.
Below are the link which has listed all control characters
https://ascii.cl/control-characters.htm
Now found its unicode on google like "si unicode". you will found an string like U+000F.
Now replace the string as \u000F
and put in replace like _x000F_
Some examples:
U+000F <Shift In> (SI) Unicode Character - Compart
value = value.replace("\u000F", "_x000F_");
U+001C <File Separator> (FS) Unicode Character - Compart
value = value.replace("\u001C", "_x001C_");
U+001E <Record Separator> (RS) Unicode Character - Compart
value = value.replace("\u001E", "_x001E_");
Upvotes: 1
Reputation: 61870
The character 0x0F
or 
, you are mentioned, is not able to store directly in XML
since it is a control character. So since *.xlsx
files stores content using XML
, that character cannot be stored directly.
Nevertheless does Microsoft defining: ECMA-376 Part 1 22.4 Variant Types 22.4.2.4 bstr (Basic String):
22.4.2.4 bstr (Basic String)
This element defines a binary basic string variant type, which can store any valid Unicode character. Unicode characters that cannot be directly represented in XML as defined by the XML 1.0 specification, shall be escaped using the Unicode numerical character representation escape character format
_xHHHH_
, where H represents a hexadecimal character in the character's value. [Example: The Unicode character 8 is not permitted in an XML 1.0 document, so it shall be escaped as_x0008_
. end example] To store the literal form of an escape sequence, the initial underscore shall itself be escaped (i.e. stored as_x005F_
). [Example: The string literal_x0008_
would be stored as_x005F_x0008_
. end example]The possible values for this element are defined by the W3C XML Schema string datatype.
This extends the W3C XML Schema string datatype. So that the character sequence _xHHHH_
does have a special meaning as a kind of entity like &#xHHHH;
.
So the following will work and Excel
will have the character 0x0F
as cell content.
import java.io.FileOutputStream;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class CreateExcelControlCharacter {
public static void main(String[] args) throws Exception {
Workbook wb = new XSSFWorkbook();
Sheet sheet = wb.createSheet();
Row row = sheet.createRow(0);
Cell cell = row.createCell(0);
String value = "PrincipalOffice\u000FDIRECTION";
value = value.replace("\u000F", "_x000F_");
cell.setCellValue(value);
wb.write(new FileOutputStream("CreateExcelControlCharacter.xlsx"));
wb.close();
}
}
But may I ask why you are needing this control character to be a Excel
cell content? That could be the answer to my question Useful use cases for escape character format _xHHHH_ in Office Open XML?.
Upvotes: 5