Wayne
Wayne

Reputation: 3519

Linux and OSX and Android sorting according to a locale not consistent

I and trying to get the same sorting order for Android and Linux and OSX. I am comparing the sort command results of Linux and OSX compared to so custom code on android that operate on a similar file set.

On Linux / OSX I use this command:

find {folder_name} -type f | sort

and in java / android I am using this - but the sorting orders do not align:

 private Enumeration<InputStream> getSortedStreams(HashMap<String,InputStream> collection) {

    Vector<InputStream> fileSreams = new Vector<>();

    List<String> keys = new ArrayList(collection.keySet());

    Collator collator = Collator.getInstance(Locale.US);//<<???
    Collections.sort(keys,collator);
    for (String key: keys) {
        Log.d(TAG, "getSortedStreams: " + key);
        fileSreams.add(collection.get(key));
    }

    return fileSreams.elements();
}

Android output:

1000/abc_d.txt
1000/abc-d.txt

OSX output:

1000/abc-d.txt
1000/abc_d.txt

I am assuming the differences are because of the locales used to sort the file list. From what I gather OSX and Linux are both POSIX compliant although Linux is not certified. Android is also not POSIX compliant but my guess it is fine with regards to sorting.

I have details below trying to make sense and to get a consistent experience across the platforms.

It seems that I can control both Linux and Android to align, but OSX is ignoring the environment variables I set.

I need specific help to set the locales so that I get a consistent results across the platforms.

I have not done tests on IOS yet, if required I can submit them.

More details:

On Fedora Core.

Test case: create two files with the following names in a directory named sort_test

sort_test/abc_d.txt
sort_test/abc-d.txt

On Fedora Linux Core 17 - 3.9.10-100.fc17.x86_64

locale -a for en_US is:

locale -a | grep en_US

en_US
en_US.iso88591
en_US.iso885915
en_US.utf8

USING C

find sort_test/ -type f | env -i LC_COLLATE=C sort
sort_test/abc-d.txt
sort_test/abc_d.txt

USING en_US.utf8

find sort_test/ -type f | env -i LC_COLLATE=en_US.utf8 sort
sort_test/abc_d.txt
sort_test/abc-d.txt

On OSX - seems to messed up, setting the locale has no effect:

local -a gives a list of locales, and the en_US locales are:

en_US
en_US.ISO8859-1
en_US.ISO8859-15
en_US.US-ASCII
en_US.UTF-8

USING C

  find sort_test -type f | env -i LC_COLLATE=C sort
    sort_test/abc-d.txt
    sort_test/abc_d.txt

USING en_US.UTF-8

find sort_test -type f | env -i LC_COLLATE=en_US.UTF-8 sort
sort_test/abc-d.txt
sort_test/abc_d.txt

On Android I set the locale to use a POSIX locale:

  Locale locale = new Locale("en", "US", "POSIX");<<< the fix
    Collator collator = Collator.getInstance(locale);
    Collections.sort(keys,collator);
    for (String key: keys) {
        Log.d(TAG, "getSortedStreams: " + key);
        fileStreams.add(collection.get(key));
    }


    /1000/abc-d.txt
    /1000/abc_d.txt

On Android I set the locale to US:

//Locale locale = new Locale("en", "US", "POSIX");
Collator collator = Collator.getInstance(Locale.US);
Collections.sort(keys,collator);
for (String key: keys) {
    Log.d(TAG, "getSortedStreams: " + key);
    fileStreams.add(collection.get(key));
}

/1000/abc_d.txt
/1000/abc-d.txt

LINUX locale variables are: locale command output:

LANG=en_US.UTF-8
LC_CTYPE=UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

OSX locale variables are: locale command output:

LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

Upvotes: 2

Views: 387

Answers (1)

Wayne
Wayne

Reputation: 3519

The solution that seems to work for me currently is to align all the operating systems with OSX.

Linux:

find sort_test -type f | env -i LC_COLLATE=C sort

OSX:

find sort_test -type f | env -i LC_COLLATE=C sort

Android:

Locale locale = new Locale("en", "US", "POSIX");<<< the fix
Collator collator = Collator.getInstance(locale);
Collections.sort(keys,collator);

Upvotes: 2

Related Questions