Xavi Gil
Xavi Gil

Reputation: 11568

MongoDB case insensitve and utf-8 sort

I save these documents in a MongoDB collection:

{ "_id" : ObjectId("5323850fa89de4a9f691dacf"), "name" : "B" }
{ "_id" : ObjectId("53238511a89de4a9f691dad0"), "name" : "a" }
{ "_id" : ObjectId("53238515a89de4a9f691dad1"), "name" : "A" }
{ "_id" : ObjectId("53238522a89de4a9f691dad2"), "name" : "z" }
{ "_id" : ObjectId("5323852ea89de4a9f691dad3"), "name" : "X" }
{ "_id" : ObjectId("5323855ea89de4a9f691dad4"), "name" : "á" }

Then I query sorting by name:

db.collection.find().sort({name:1});

and the result is:

{ "_id" : ObjectId("53238515a89de4a9f691dad1"), "name" : "A" }
{ "_id" : ObjectId("5323850fa89de4a9f691dacf"), "name" : "B" }
{ "_id" : ObjectId("5323852ea89de4a9f691dad3"), "name" : "X" }
{ "_id" : ObjectId("53238511a89de4a9f691dad0"), "name" : "a" }
{ "_id" : ObjectId("53238522a89de4a9f691dad2"), "name" : "z" }
{ "_id" : ObjectId("5323855ea89de4a9f691dad4"), "name" : "á" }

I have discovered that MongoDB not only doesn't support utf-8 sorting, but it also seems it doesn't support case insensitive sorting.

What is the best solution to support both types of sort?

(I am working with Mongoose and Express in a Node web application)

Upvotes: 2

Views: 4330

Answers (2)

Gianfranco P
Gianfranco P

Reputation: 10794

From v3.4 (released in Nov 2016) MongoDB supports sorting by collation (SERVER-1920 - Release Notes):

> db.myColl.insert([{_id: 1, "term": "cote"}, {_id: 2, "term": "coté"}, {_id: 3, "term" : "côte"}, {_id: 4, "term" : "côté"}])
Inserted 1 record(s) in 56ms

> db.myColl.find().sort({"term": -1})
{ "_id": 4, "term": "côté" }
{ "_id": 3, "term": "côte" }
{ "_id": 2, "term": "coté" }
{ "_id": 1, "term": "cote" }

> db.myColl.find().sort({"term": -1}).collation({"locale": "fr_CA"})
{ "_id": 4, "term": "côté" }
{ "_id": 2, "term": "coté" }
{ "_id": 3, "term": "côte" }
{ "_id": 1, "term": "cote" }

More options on the collation() method and examples here: https://docs.mongodb.com/manual/reference/method/cursor.collation/#examples

It supports these locales: https://docs.mongodb.com/manual/reference/collation-locales-defaults/#collation-languages-locales

See also this great blog post on Natural Language Sorting from Derick Rethans

Upvotes: 1

Anand Jayabalan
Anand Jayabalan

Reputation: 12904

You can use the aggregation framework to achieve this. To be more specific, you can use the $project operator to create a lower-case version of name using $toLower and then sort on it.

Example:

db.test.aggregate([
    {$project:{name:1, nameLower:{$toLower:"$name"}}}, 
    {$sort:{nameLower:1}}
])

Note: As quoted in the documentation:

$toLower may not make sense when applied to glyphs outside the Roman alphabet.

Output for the sample data set you provided:

db.test.aggregate([{$project:{name:1, nameLower:{$toLower:"$name"}}}, {$sort:{nameLower:1}}])
{
        "result" : [
                {
                        "_id" : ObjectId("5324387e87861ff5f2696216"),
                        "name" : "a",
                        "nameLower" : "a"
                },
                {
                        "_id" : ObjectId("5324388687861ff5f2696217"),
                        "name" : "A",
                        "nameLower" : "a"
                },
                {
                        "_id" : ObjectId("5324387987861ff5f2696215"),
                        "name" : "B",
                        "nameLower" : "b"
                },
                {
                        "_id" : ObjectId("5324389187861ff5f2696219"),
                        "name" : "X",
                        "nameLower" : "x"
                },
                {
                        "_id" : ObjectId("5324388a87861ff5f2696218"),
                        "name" : "z",
                        "nameLower" : "z"
                },
                {
                        "_id" : ObjectId("53243ab087861ff5f269621a"),
                        "name" : "á",
                        "nameLower" : "á"
                }
        ],
        "ok" : 1
}

Upvotes: 0

Related Questions