Reputation: 21
I created in Go some conversion for decoding inputs. It is working as charm, (thanks to "golang.org/x/net/html/charset"), but now I have to limit output to characters contained only in utf8mb3. As far as I know, go default "builtin" is full utf-8. The problem is that the underlying database setting is locked by vendor rules and setted to utf8mb3 (yep mysql), we can't change those.
So far I'm using this to limit characters and rewrite "unallowed" to "*":
//compile our regexp. if fails, return undecoded
allowedCharsREGEX = `[^ěščřžýáíéúůťňĺľŕĚŠČŘŽÝÁÍÉÚŮŤŇĹĽŔ!?§©®±%¼½¾¿ß÷£¥¢~¡#&_\"\\/:;a-zA-Z_0-9\t\n\r\ ]`
reg := regexp.MustCompile(allowedCharsREGEX)
procString := outStr
// replace not allowed chars
procString = reg.ReplaceAllString(outStr,"*")
to limit output characters but want to expand it to utf8mb3 char list. From documentation seems unicode IsValid is full utf8. Any possible "quick solution"?
Go v.1.13, ubuntu 20.04
Upvotes: 1
Views: 494
Reputation: 42429
Not everything should be done with a regexp.
utf8mb3 contains all runes from the BMP which can be encoded with 3 Bytes in UTF-8.
sb := &strings.Builder{}
for _, r := range input {
if r < 0xFFFF {
sb.WriteRune(r)
} else {
sb.WriteByte('*')
}
}
return sb.String
Upvotes: 3