UTF-8 range table in Go

Question

I have been reading the unicode Go page and I'm wondering what the use case of the range tables are. What can they be used for? Is there a function to retrieve the range that a single character can be found in.

Hymns For Disco · Accepted Answer

The purpose of a range table is that it is an efficient way to describe a set of characters. Due to the way that characters are added to the Unicode standard, characters with similar properties will often be found together. So, it's usually more space-efficient to list the ranges where a specific set of characters exist, rather than listing every individual character.

This allows you to look up if a given character exists within a specific character set by performing a series of range checks. If the character's Unicode code point is within any of the ranges in the range table, then that character is considered to be an element of the character set that the range table describes.

There isn't a general function to retrieve the range that a single character can be found in, because character -> range isn't a unique, or particularly useful relationship in the general case. For example, take the letter A. It exists in the range [65, 90] (ASCII uppercase letters), but it also exists in the range [0, 127] (all ASCII characters), and the ranges [9, 9999], [60, 70], etc..

If you want to know if a character is in a particular set of range tables, you can use the unicode.In function.

Example:

package main

import (
    "fmt"
    "unicode"
)

func main() {
    found := unicode.In('A', unicode.Latin)
    fmt.Println(found)
}

true

This checks if A exists within any of the given range table unicode.Latin, or "the set of Unicode characters in script Latin"

UTF-8 range table in Go

Answers (1)

Related Questions