Reputation: 11568
I save these documents in a MongoDB collection:
{ "_id" : ObjectId("5323850fa89de4a9f691dacf"), "name" : "B" }
{ "_id" : ObjectId("53238511a89de4a9f691dad0"), "name" : "a" }
{ "_id" : ObjectId("53238515a89de4a9f691dad1"), "name" : "A" }
{ "_id" : ObjectId("53238522a89de4a9f691dad2"), "name" : "z" }
{ "_id" : ObjectId("5323852ea89de4a9f691dad3"), "name" : "X" }
{ "_id" : ObjectId("5323855ea89de4a9f691dad4"), "name" : "á" }
Then I query sorting by name:
db.collection.find().sort({name:1});
and the result is:
{ "_id" : ObjectId("53238515a89de4a9f691dad1"), "name" : "A" }
{ "_id" : ObjectId("5323850fa89de4a9f691dacf"), "name" : "B" }
{ "_id" : ObjectId("5323852ea89de4a9f691dad3"), "name" : "X" }
{ "_id" : ObjectId("53238511a89de4a9f691dad0"), "name" : "a" }
{ "_id" : ObjectId("53238522a89de4a9f691dad2"), "name" : "z" }
{ "_id" : ObjectId("5323855ea89de4a9f691dad4"), "name" : "á" }
I have discovered that MongoDB not only doesn't support utf-8 sorting, but it also seems it doesn't support case insensitive sorting.
What is the best solution to support both types of sort?
(I am working with Mongoose and Express in a Node web application)
Upvotes: 2
Views: 4330
Reputation: 10794
From v3.4 (released in Nov 2016) MongoDB supports sorting by collation (SERVER-1920 - Release Notes):
> db.myColl.insert([{_id: 1, "term": "cote"}, {_id: 2, "term": "coté"}, {_id: 3, "term" : "côte"}, {_id: 4, "term" : "côté"}])
Inserted 1 record(s) in 56ms
> db.myColl.find().sort({"term": -1})
{ "_id": 4, "term": "côté" }
{ "_id": 3, "term": "côte" }
{ "_id": 2, "term": "coté" }
{ "_id": 1, "term": "cote" }
> db.myColl.find().sort({"term": -1}).collation({"locale": "fr_CA"})
{ "_id": 4, "term": "côté" }
{ "_id": 2, "term": "coté" }
{ "_id": 3, "term": "côte" }
{ "_id": 1, "term": "cote" }
More options on the collation()
method and examples here: https://docs.mongodb.com/manual/reference/method/cursor.collation/#examples
It supports these locales: https://docs.mongodb.com/manual/reference/collation-locales-defaults/#collation-languages-locales
See also this great blog post on Natural Language Sorting from Derick Rethans
Upvotes: 1
Reputation: 12904
You can use the aggregation framework to achieve this. To be more specific, you can use the $project operator to create a lower-case version of name using $toLower and then sort on it.
Example:
db.test.aggregate([
{$project:{name:1, nameLower:{$toLower:"$name"}}},
{$sort:{nameLower:1}}
])
Note: As quoted in the documentation:
$toLower may not make sense when applied to glyphs outside the Roman alphabet.
Output for the sample data set you provided:
db.test.aggregate([{$project:{name:1, nameLower:{$toLower:"$name"}}}, {$sort:{nameLower:1}}])
{
"result" : [
{
"_id" : ObjectId("5324387e87861ff5f2696216"),
"name" : "a",
"nameLower" : "a"
},
{
"_id" : ObjectId("5324388687861ff5f2696217"),
"name" : "A",
"nameLower" : "a"
},
{
"_id" : ObjectId("5324387987861ff5f2696215"),
"name" : "B",
"nameLower" : "b"
},
{
"_id" : ObjectId("5324389187861ff5f2696219"),
"name" : "X",
"nameLower" : "x"
},
{
"_id" : ObjectId("5324388a87861ff5f2696218"),
"name" : "z",
"nameLower" : "z"
},
{
"_id" : ObjectId("53243ab087861ff5f269621a"),
"name" : "á",
"nameLower" : "á"
}
],
"ok" : 1
}
Upvotes: 0