Reputation: 401
When I was reading from a usocket stream using the code below:
(let ((stream (socket-stream sk)) line)
(loop for line = (read-line stream)
while line do (format t line)))
when read-line meets an non-ascii charactor, it throw out an exception:
decoding error on stream
#<SB-SYS:FD-STREAM
for "socket 118.229.141.195:52946, peer: 119.75.217.109..."
{BCA02F1}>
(:EXTERNAL-FORMAT :UTF-8):
the octet sequence (176) cannot be decoded.
[Condition of type SB-INT:STREAM-DECODING-ERROR]
Neither read-line nor read-byte works, so I tried to use trivial-utf-8 to read utf-8 string using read-utf-8-string, but It only accepts a binary stream, it seems socket-stream does not create a binary stream, so I was confused how to read from a socket stream that has non-ascii charactors?
Upvotes: 2
Views: 582
Reputation: 15759
The error you're getting indicates that the data you're trying to read is not actually valid UTF-8 data. Indeed, 176
(= #b10110000
) is not a byte that can introduce a UTF-8 character. If the data you're trying to read is in some other encoding, try adjusting your Lisp compiler's external format setting accordingly or using Babel or FLEXI-STREAMS to decode the data.
Upvotes: 1
Reputation:
Once I needed it and I was lazy to look for a library to do it, so I did it myself :) It may not be the best way, but I only needed something for a fast and not complicated, so here it goes:
(defun read-utf8-char (stream)
(loop for i from 7 downto 0
with first-byte = (read-byte stream nil 0)
do (when (= first-byte 0) (return +null+))
do (when (or (not (logbitp i first-byte)) (= i 0))
(setf first-byte (logand first-byte (- (ash 1 i) 1)))
(return
(code-char
(dotimes (a (- 6 i) first-byte)
(setf first-byte
(+ (ash first-byte 6)
(logand (read-byte stream) #x3F)))))))))
Upvotes: 0
Reputation: 9451
You can first read-sequence
(if you know the length ahead of time) or read-bytes
while there are some, and then convert them to string with (babel:octets-to-string octets :encoding :utf-8))
(where octets is (make-array expected-length :element-type '(unsigned-byte 8))
).
Upvotes: 1