Grégory Roche
Grégory Roche

Reputation: 172

What the mean of [=c=] and [.symbol.] in Bash?

I'm looking for the mean of "[=c=]" and "[.symbol.]" in Bash and some examples.

Thanks.

The subject "Bash - what does tr -d [=,=] do?" does not answer my question because it has a very light response about "[=c=]", and there isn't response about "[.symbol.]".

Upvotes: 0

Views: 613

Answers (1)

user8017719
user8017719

Reputation:

Both have to do with collation.

But, what is collation?

It is the way that characters get sorted, many times as a dictionary would sort them.

What that means is different for different languages. Some languages do not have accented letters and use only ASCII letters. For those, the ASCII number of a character is enough and characters are sorted by their ASCII value (avoiding control characters 0-31 and 127):

$ printf '%b' "$(printf '\\U%x' {32..126})"; echo
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

However, things are never so simple.
How should a C and a c be sorted in a dictionary?
Most of the times the answer is: together.
Think about it, where are you going to seek for the word Canada?
Inside the entry for c?
Yes, that makes sense, doesn't it?

[= =]

And that is what sets the start for "equivalent" characters. Of course c is equivalent to c:

$ [[ c =~ [[=c=]] ]] && echo "yes" || echo "no"
yes

And d is not equivalent to c:

$ [[ d =~ [[=c=]] ]] && echo "yes" || echo "no"
no

In many cases, C is also equivalent to c:

$ [[ C =~ [[=c=]] ]] && echo "yes" || echo "no"
yes

but, again, not so simple: Not in all languages:

$ LC_COLLATE=C ; [[ C =~ [[=c=]] ]] && echo "yes" || echo "no"
no

In Germany, the umlaut 'ü' should collate to u:

$ LC_COLLATE=de_DE.UTF8; [[ ü =~ [[=u=]] ]] && echo "yes" || echo "no"
yes

Which also happens in English:

$ LC_COLLATE=en_US.UTF8; [[ ü =~ [[=u=]] ]] && echo "yes" || echo "no"
yes

It seems also reasonable that all accented characters with e as a base:

é è ê ë ề ḕ É È Ê Ë Ề Ḕ

should collate together. That is what UNICODE does.

[. .]

The concept of a [.….] has to do with digraphs. In which, some double letters represent an unique sound, and, in some languages, such double letters act as an additional letter:

Collating Symbols
A collating symbol is a multi-character collating
element enclosed in [. and .]. For example, if ch
is a collating element, then [[.ch.]] is a regular
expression that matches this collating element,
while [ch] is a regular expression that matches
either c or h.

The USA Spanish locale still retains the old collating symbol for ll:

$ LC_COLLATE=es_US.UTF8; [[ olla =~ [[.ll.]] ]] && echo "yes" || echo "no"
yes

But Spain has (long ago) removed such use:

$ LC_COLLATE=es_ES.UTF8; [[ olla =~ [[.ll.]] ]] && echo "yes" || echo "no"
no

Other countries will sure have other rules.

Upvotes: 6

Related Questions