Roland Illig
Roland Illig

Reputation: 41625

Check whether a Unicode code point is assigned

Go has the unicode package, containing useful functions such as IsGraphic or IsPrint. One function that is missing though is IsAssigned. Of course I could write my own function by using the other functions. But I would rather expect the standard library to provide this function. In Java, writing this function is easy:

boolean isAssigned(int codePoint) {
    return Character.getType(codePoint) != Character.UNASSIGNED;
}

In Go there is no function unicode.Type(rune) or unicode.IsAssigned(rune). The closest I could find is this:

func IsAssigned(r rune) bool {
    return unicode.IsControl(r) ||
            unicode.IsGraphic(r) ||
            unicode.IsSymbol(r)
}

But that code thinks that U+00AD (soft-hyphen) is unassigned, which is wrong.

How can I get correct information about unassigned code points?

Upvotes: 2

Views: 380

Answers (1)

putu
putu

Reputation: 6444

I think you can verify a code point is assigned or not using unicode.Is and unicode.Categories (though it is not efficient), i.e.

func IsAssigned(r rune) bool {
    for _, v := range unicode.Categories {
        if unicode.Is(v, r) {
            return true
        }
    }
    return false
}

Working example is in The Go Playground.

Upvotes: 1

Related Questions