dosia96
dosia96

Reputation: 11

Why transcoding Windows-1252 encoding set to utf-8 is still garbled by using GO

I got a bunch of http requests through traffic recording, and they came to the responseBody with different character set encodings like UTF-8, windows-1252, etc., and I wrote a piece of code that converts the various character set encodings to UTF-8 for identification, However, after I converted it into UTF-8 and saved to csv file, it was still garbled. I printed the converted []byte charset, and it was shown as UTF-8. Can someone help me?

Here's my code for converting:

func decodeToUTF8(data []byte) (string, error) {
    detector := chardet.NewTextDetector()
    charsetResult, err := detector.DetectBest(data)
    if err != nil {
        return "", fmt.Errorf("error: %v", err)
    }

    log.Println("charset:", charsetResult.Charset)

    var transformer transform.Transformer
    if charsetResult.Charset == "windows-1252" {
        // 如果编码是 windows-1252,需要特殊处理
        transformer = charmap.Windows1252.NewDecoder()
    } else {
        encoding, _ := charset.Lookup(charsetResult.Charset)
        if encoding == nil {
            return "", fmt.Errorf("error: %v", charsetResult.Charset)
        }
        transformer = encoding.NewDecoder()
    }

    // 使用检测到的编码转换为 UTF-8
    decodedData, _, err := transform.Bytes(transformer, data)
    if err != nil {
        return "", fmt.Errorf("error: %v", err)
    }

    return string(decodedData), nil
}

And here's the result after convert to UTF-8: enter image description here

I want any encoding set to output UTF-8

Upvotes: 0

Views: 360

Answers (0)

Related Questions