Viker
Viker

Reputation: 21

How can I limit string characters to utf8mb3

I created in Go some conversion for decoding inputs. It is working as charm, (thanks to "golang.org/x/net/html/charset"), but now I have to limit output to characters contained only in utf8mb3. As far as I know, go default "builtin" is full utf-8. The problem is that the underlying database setting is locked by vendor rules and setted to utf8mb3 (yep mysql), we can't change those.

So far I'm using this to limit characters and rewrite "unallowed" to "*":

 //compile our regexp. if fails, return undecoded
        allowedCharsREGEX = `[^ěščřžýáíéúůťňĺľŕĚŠČŘŽÝÁÍÉÚŮŤŇĹĽŔ!?§©®±%¼½¾¿ß÷£¥¢~¡#&_\"\\/:;a-zA-Z_0-9\t\n\r\ ]`
        reg := regexp.MustCompile(allowedCharsREGEX)
        procString := outStr

        // replace not allowed chars
        procString = reg.ReplaceAllString(outStr,"*")

to limit output characters but want to expand it to utf8mb3 char list. From documentation seems unicode IsValid is full utf8. Any possible "quick solution"?

Go v.1.13, ubuntu 20.04

Upvotes: 1

Views: 494

Answers (1)

Volker
Volker

Reputation: 42429

Not everything should be done with a regexp.

utf8mb3 contains all runes from the BMP which can be encoded with 3 Bytes in UTF-8.

sb := &strings.Builder{}
for _, r := range input {
    if r < 0xFFFF {
        sb.WriteRune(r)
    } else {
        sb.WriteByte('*')
    }
 }
 return sb.String

Upvotes: 3

Related Questions