Reputation: 11
I got a bunch of http requests through traffic recording, and they came to the responseBody with different character set encodings like UTF-8, windows-1252, etc., and I wrote a piece of code that converts the various character set encodings to UTF-8 for identification, However, after I converted it into UTF-8 and saved to csv file, it was still garbled. I printed the converted []byte charset, and it was shown as UTF-8. Can someone help me?
Here's my code for converting:
func decodeToUTF8(data []byte) (string, error) {
detector := chardet.NewTextDetector()
charsetResult, err := detector.DetectBest(data)
if err != nil {
return "", fmt.Errorf("error: %v", err)
}
log.Println("charset:", charsetResult.Charset)
var transformer transform.Transformer
if charsetResult.Charset == "windows-1252" {
// 如果编码是 windows-1252,需要特殊处理
transformer = charmap.Windows1252.NewDecoder()
} else {
encoding, _ := charset.Lookup(charsetResult.Charset)
if encoding == nil {
return "", fmt.Errorf("error: %v", charsetResult.Charset)
}
transformer = encoding.NewDecoder()
}
// 使用检测到的编码转换为 UTF-8
decodedData, _, err := transform.Bytes(transformer, data)
if err != nil {
return "", fmt.Errorf("error: %v", err)
}
return string(decodedData), nil
}
And here's the result after convert to UTF-8: enter image description here
I want any encoding set to output UTF-8
Upvotes: 0
Views: 360