Reputation: 6832
I have this static method that receives a file path from a .xls
file and extracts a single String
out of it. I'm using POI to retrieve this data. This is the code:
private static String readsIPCASheet() throws Exception {
String filePath = unzipsIPCAfile();
InputStream input = new FileInputStream(filePath);
HSSFWorkbook wb = new HSSFWorkbook(new POIFSFileSystem(input));
ExcelExtractor extractor = new ExcelExtractor(wb);
extractor.setFormulasNotResults(true);
extractor.setIncludeSheetNames(false);
extractor.setIncludeBlankCells(true);
String text = extractor.getText();
extractor.close();
}
I discovered that using my local machine (located in Brazil) I receive the following data:
198.22 40.27 171.24 568.17 98.22 3,035.71
But when I run this code on my VPS (located in US) I receive this:
198,22 40,27 171,24 568,17 98,22 3.035,71
Using unix locale
, I got this:
Remote:
wildfly@server:~$ locale
LANG=pt_BR.UTF-8
LANGUAGE=
LC_CTYPE=pt_BR.UTF-8
LC_NUMERIC=pt_BR.UTF-8
LC_TIME=pt_BR.UTF-8
LC_COLLATE="pt_BR.UTF-8"
LC_MONETARY=pt_BR.UTF-8
LC_MESSAGES="pt_BR.UTF-8"
LC_PAPER=pt_BR.UTF-8
LC_NAME=pt_BR.UTF-8
LC_ADDRESS=pt_BR.UTF-8
LC_TELEPHONE=pt_BR.UTF-8
LC_MEASUREMENT=pt_BR.UTF-8
LC_IDENTIFICATION=pt_BR.UTF-8
LC_ALL=
Local:
gabriel ~ $ locale
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE=pt_BR.UTF-8
LC_NUMERIC=pt_BR.UTF-8
LC_TIME=pt_BR.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=pt_BR.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=pt_BR.UTF-8
LC_NAME=pt_BR.UTF-8
LC_ADDRESS=pt_BR.UTF-8
LC_TELEPHONE=pt_BR.UTF-8
LC_MEASUREMENT=pt_BR.UTF-8
LC_IDENTIFICATION=pt_BR.UTF-8
LC_ALL=
Since I dealing with regex after this piece of code, I prefer to set a standard about this than solving two unnecessary situations. How can accomplish this?
Upvotes: 0
Views: 1014
Reputation: 61925
The apache poi
Excel extractors (HSSF
as well as XSSF
) are using DataFormatter for formatting numeric values. So DataFormatter
decides to use comma or dot as thousand separator. At least in the last stable apache poi
version 3.17
, it does this decision dependent of the LocaleUtil.getUserLocale(). See DataFormatter.java:235ff.
So if you do setting LocaleUtil.setUserLocale before creating the extractor, then this locale will be used to decide to use comma or dot as thousand separator.
Example
...
LocaleUtil.setUserLocale(java.util.Locale.US);
ExcelExtractor extractor = new ExcelExtractor(wb);
...
uses US
locale in DataFormatter
and so uses dot as decimal separator and comma as thousand separator.
Upvotes: 3