horace
horace

Reputation: 958

In Python (or any language) what does an "upper" function do to Hindi, Amharric and other non-Latin character sets?

Subject says it all. Been looking for an answer, but cannot seem to find it.

I am writing a web app that will store data in a database and also have language files translated into a wide variety of character sets. At various moments, the text will be presented. I want to control presentation such as spurious blank spaces at the beginning and end of strings. Also I want to ensure some letters are upper or lower case.

My question is: what happens in upper/lower case functions when the character set only has one case?

EDIT Sub question: Are there any unexpected side effects to be aware of?

My guess is that you simply get back the one and only character.

EDIT - Added Description The main reason for asking this question is that I am writing a webapp that will be distributed and run on machines in remote areas with little or no chance to fix "on-the-spot" bugs. It's not a complicated webapp, but will run with many different language char sets. I want to be certain of my footing before releasing the server.

Upvotes: 2

Views: 109

Answers (2)

Andj
Andj

Reputation: 1447

An old question, but may need further elucidation.

Case mapping will leave unicameral scripts (scripts without case distinctions) as they are. Only text in bicameral scripts will be affected.

Case mapping can be either language/locale insensitive or language/locale sensitive. In Python, the str.lower() and str.upper() methods are language and locale insensitive.

One possible side effect is that the case mapping may be wrong, depending on the language of the text being cased.

It is worth noting that Unicode defines simple case mapping and full case mapping. Simple case mapping will be one to one, ie a single character is case mapped into a single character. Full case mapping is not one to one. There are instances that a single character will be cased to more than one character. Additionally case mapping isn't symmetrical, and not all letters in a bicameral script have casing pairs.

There are always edge cases. But this is probably only a consideration if your web app is intended to be able to handle any language. For multilingual web apps that target a specific set of languages, this may not be a problem.

Python uses full language insensitive case mapping.

For a multilingual web app, it would be preferable to use language sensitive methods. If you need language sensitive case mapping, you could:

  1. use PyICU or similar for case mapping in the backend, or
  2. Use Javascript for case mapping in the frontend.

Upvotes: 1

Ole Pannier
Ole Pannier

Reputation: 3683

First of all the upper() and lower() method in python can be applied to Hindi, Amharric and non-letter character sets.

For instance will the upper() method converts the lowercase characters if an equivalent uppercase of this char exists. If not, then not.

Or better said, if there is nothing to convert, it stays the same.

Upvotes: 2

Related Questions