Reputation: 391
I'm writing a lisp program to fetch a web page of a Chinese website, I meet problem about parsing the Chinese words from the binary stream, I already have a vector of (unsigned-byte 8) containing the whole page, but when I put it to the babel:octets-to-string, it throws out an exception.
(setf buffer (babel:octets-to-string buffer :encoding :utf-8))
The exception is:
Illegal :UTF-8 character starting at position 437. [Condition of type BABEL-ENCODINGS:INVALID-UTF8-CONTINUATION-BYTE]
I fount that when it meet a Chinese word it must throw out this exception. How can I solve it?
Upvotes: 4
Views: 771
Reputation: 4469
The error message says everything - there is an invalid UTF-8 byte sequence in your data.
The most probable cause for this error is that the page text itself is not encoded in UTF-8 but some other encoding for Chinese text. You should check the HTML 'META HTTP-EQUIV' tag and 'Content-Type' HTTP Response Header for encoding.
Upvotes: 6