Reputation: 172
I'm looking for the mean of "[=c=]" and "[.symbol.]" in Bash and some examples.
Thanks.
The subject "Bash - what does tr -d [=,=] do?" does not answer my question because it has a very light response about "[=c=]", and there isn't response about "[.symbol.]".
Upvotes: 0
Views: 613
Reputation:
Both have to do with collation.
But, what is collation?
It is the way that characters get sorted, many times as a dictionary would sort them.
What that means is different for different languages. Some languages do not have accented letters and use only ASCII letters. For those, the ASCII number of a character is enough and characters are sorted by their ASCII value (avoiding control characters 0-31 and 127):
$ printf '%b' "$(printf '\\U%x' {32..126})"; echo
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
However, things are never so simple.
How should a C
and a c
be sorted in a dictionary?
Most of the times the answer is: together.
Think about it, where are you going to seek for the word Canada
?
Inside the entry for c
?
Yes, that makes sense, doesn't it?
And that is what sets the start for "equivalent" characters.
Of course c
is equivalent to c
:
$ [[ c =~ [[=c=]] ]] && echo "yes" || echo "no"
yes
And d
is not equivalent to c:
$ [[ d =~ [[=c=]] ]] && echo "yes" || echo "no"
no
In many cases, C is also equivalent to c:
$ [[ C =~ [[=c=]] ]] && echo "yes" || echo "no"
yes
but, again, not so simple: Not in all languages:
$ LC_COLLATE=C ; [[ C =~ [[=c=]] ]] && echo "yes" || echo "no"
no
In Germany, the umlaut 'ü' should collate to u
:
$ LC_COLLATE=de_DE.UTF8; [[ ü =~ [[=u=]] ]] && echo "yes" || echo "no"
yes
Which also happens in English:
$ LC_COLLATE=en_US.UTF8; [[ ü =~ [[=u=]] ]] && echo "yes" || echo "no"
yes
It seems also reasonable that all accented characters with e
as a base:
é è ê ë ề ḕ É È Ê Ë Ề Ḕ
should collate together. That is what UNICODE does.
The concept of a [.….]
has to do with digraphs. In which, some double letters represent an unique sound, and, in some languages, such double letters act as an additional letter:
Collating Symbols
A collating symbol is a multi-character collating
element enclosed in [. and .]. For example, if ch
is a collating element, then [[.ch.]] is a regular
expression that matches this collating element,
while [ch] is a regular expression that matches
either c or h.
The USA Spanish locale still retains the old collating symbol for ll
:
$ LC_COLLATE=es_US.UTF8; [[ olla =~ [[.ll.]] ]] && echo "yes" || echo "no"
yes
But Spain has (long ago) removed such use:
$ LC_COLLATE=es_ES.UTF8; [[ olla =~ [[.ll.]] ]] && echo "yes" || echo "no"
no
Other countries will sure have other rules.
Upvotes: 6