Andrea
Andrea

Reputation: 20493

Django: comparison on extra fields

Short question: Is there a way in Django to find the next row, based on the alphabetical order of some field, in a case-insensitive way?

Long question: I have some words in the database, and a detail view for them. I would like to be able to browse the words in alphabetical order. So I need to find out the id of the previous and next word in alphabetical order. Right now what I do is the following (original is the field that stores the name of the word):

class Word(models.Model):
    original = models.CharField(max_length=50)
    ...

    def neighbours(self):
        """
        Returns the words adjacent to a given word, in alphabetical order
        """
        previous_words = Word.objects.filter(
            original__lt=self.original).order_by('-original')
        next_words = Word.objects.filter(
            original__gt=self.original).order_by('original')
        previous = previous_words[0] if len(previous_words) else None
        next = next_words[0] if len(next_words) else None
        return previous, next

The problem is that this does a case-sensitive comparison, so Foo appears before bar, which is not what I want. To avoid this problem, in another view - where I list all words, I have made use of a custom model manager which adds an extra field, like this

class CaseInsensitiveManager(models.Manager):

    def get_query_set(self):
        """
        Also adds an extra 'lower' field which is useful for ordering
        """
        return super(CaseInsensitiveManager, self).get_query_set().extra(
            select={'lower': 'lower(original)'})

and in the definition of Word I add

objects = models.Manager()
alpha = CaseInsensitiveManager()

In this way I can do queries like

Word.alpha.all().order_by('lower')

and get all words in alphabetical order, regardless of the case. But I cannot do

class Word(models.Model):
    original = models.CharField(max_length=50)
    ...

    objects = models.Manager()
    alpha = CaseInsensitiveManager()

    def neighbours(self):
        previous_words = Word.objects.filter(
            lower__lt=self.lower()).order_by('-lower')
        next_words = Word.objects.filter(
            lower__gt=self.lower()).order_by('lower')
        previous = previous_words[0] if len(previous_words) else None
        next = next_words[0] if len(next_words) else None
        return previous, next

Indeed Django will not accept field lookups based on extra fields. So, what am I supposed to do (short of writing custom SQL)?

Bonus questions: I see at least to more problems in what I am doing. First, I'm not sure about performance. I assume that no queries at all are performed when I define previous_words and next_words, and the only lookup in the database will happen when I define previous and next, yielding a query which is more or less

SELECT Word.original, ..., lower(Word.original) AS lower
WHERE lower < `foo`
ORDER BY lower DESC
LIMIT 1

Is this right? Or am I doing something which will slow down the database too much? I don't know enough details about the inner workings of the Django ORM.

The second problem is that I actually have to cope with words in different languages. Given that I know the language for each word, is there a way to get them in alphabetical order even if they have non-ASCII characters. For instance I'd want to have méchant, moche in this order, but I get moche, méchant.

Upvotes: 0

Views: 490

Answers (1)

Tom Gruner
Tom Gruner

Reputation: 9635

The database should be able to do this sorting for you, and it should be able to do so without the "lower" function.

Really what you need to fix is the database collation and encoding.

For example, if you are using mysql you could use the character set utf8 and collation utf8_general_ci

If that collation doesn't work for you, you can try other collations depending on your needs and database. But using an extra field and a function in the query is an ugly workaround that is going to slow the app down.

There are many collations options available in mysql and postgresql too:

http://dev.mysql.com/doc/refman/5.5/en/charset-mysql.html http://stackoverflow.com/questions/1423378/postgresql-utf8-character-comparison

But this is definitely a good chance to optimise at the db level.

Upvotes: 1

Related Questions