steve
steve

Reputation: 183

library/package in go that handles string encoding?

Are there any similar libraries/packages in go that emulate what vis(3) and unvis(3) do for BSD systems? I'm trying to do something that requires representation of strings that contain special characters like whitespace and such.

Upvotes: 0

Views: 665

Answers (1)

user6169399
user6169399

Reputation:

No, Not exactly, but if you are looking for URL encoding, You can do all the URL encoding you want with the net/url package:
see: Encode / decode URLs
and: Is there any example and usage of url.QueryEscape ? for golang

sample code:

fmt.Println(url.QueryEscape("https://stackoverflow.com/questions/tagged/go test\r \r\n"))

output:

http%3A%2F%2Fstackoverflow.com%2Fquestions%2Ftagged%2Fgo+test%0D+%0D%0A

or write your own:

in Go string is UTF-8 encoded, and is in effect a read-only slice of bytes:
you may get bytes like this:

str := "UTF-8"
bytes := []byte(str)    //   string to slice
fmt.Println(str, bytes) // UTF8 [85 84 70 45 56]

or convert bytes to string like this:

s := string([]byte{85, 84, 70, 45, 56, 32, 0xc2, 0xb5}) // slice to string
fmt.Println(s)                                          // UTF-8 µ

0xC2 0xB5 is UTF-8 (hex) for Character 'MICRO SIGN' (U+00B5) see: http://www.fileformat.info/info/unicode/char/00b5/index.htm

also you may get bytes like this:

for i := 0; i < len(s); i++ {
    fmt.Printf("%d: %d, ", i, s[i])
    //0: 85, 1: 84, 2: 70, 3: 45, 4: 56, 5: 32, 6: 194, 7: 181,
}

or in compact Hex format:

fmt.Printf("% x\n", s) // 55 54 46 2d 38 20 c2 b5

and get runes (Unicode codepoints) like this:

for i, v := range s {
    fmt.Printf("%d: %v, ", i, v)
    //0: 85, 1: 84, 2: 70, 3: 45, 4: 56, 5: 32, 6: 181, 
}

see: What is a rune?

and convert rune to string:

r := rune(181)
fmt.Printf("%#U\n", r) // U+00B5 'µ'
st := "this is UTF-8: " + string(r)
fmt.Println(st) // this is UTF-8: µ

convert slice of runes to string:

rs := []rune{181, 181, 181, 181}
sr := string(rs)
fmt.Println(sr) // µµµµ

convert string to slice of runes:

br := []rune(sr)
fmt.Println(br) //[181 181 181 181]

The %q (quoted) verb will escape any non-printable byte sequences in a string so the output is unambiguous:

fmt.Printf("%+q \n", "Hello, 世界") // "Hello, \u4e16\u754c"

unicode.IsSpace reports whether the rune is a space character as defined by Unicode's White Space property; in the Latin-1 space this is

'\t', '\n', '\v', '\f', '\r', ' ', U+0085 (NEL), U+00A0 (NBSP). sample code:

package main

import (
    "bytes"
    "fmt"
    "unicode"
)

func main() {
    var buf bytes.Buffer
    s := "\u4e16\u754c \u0020\r\n  世界"
    for _, r := range s {
        if unicode.IsSpace(r) {
            buf.WriteString(fmt.Sprintf("\\u%04x", r))
        } else {
            buf.WriteString(string(r))
        }
    }
    st := buf.String()
    fmt.Println(st)
}

output:

世界\u0020\u0020\u000d\u000a\u0020\u0020世界

You can find more functions in the unicode/utf8, unicode, strconv and strings packages:
https://golang.org/pkg/unicode/utf8/
https://golang.org/pkg/unicode/
https://golang.org/pkg/strings/
https://golang.org/pkg/strconv/

https://blog.golang.org/strings

Upvotes: 1

Related Questions