Reputation: 64774
I'm trying to track down a Python UnicodeDecodeError in the following log line:
10.210.141.123 - - [09/Nov/2011:14:41:04 -0800] "gfR\x15¢\x09ì|Äbk\x0F[×ÐÖà\x11CEÐÌy\x5C¿DÌj\x08Ï ®At\x07å!;f>\x08éPW¤\x1C\x02ö*6+\x5C\x15{,ªIkCRA\x22 xþP9â\x13h\x01¢è´\x1DzõWiË\x5C\x10sòʨR)¶²\x1F8äl¾¢{ÆNw\x08÷@ï" 400 166 0.000 "-" "-"
I opened the entire log file in Vim, and then yanked the line into a new file so I could test just the one line. However, my parsing script works OK with the new file - it doesn't throw a UnicodeDecodeError. I don't understand why the one file would generate an error and the other one would not, when they are (on the surface) identical.
Here's what I tried: running enca
to determine the file encoding, which complained that it Cannot determine (or understand) your language preferences.
file -i
says that both files are Regular file
s. I also deleted every other line in the original log file and still got the error in one file and no error in the other. I tried deleting
set encoding=utf-8
from my .vimrc, writing the file again, and I still got the error in one file and not in the other.
The logs are nginx logs. Nginx has this note in their release notes:
*) Change: now the 0x00-0x1F, '"' and '\' characters are escaped as \xXX
in an access_log.
Thanks to Maxim Dounin.
My Python script has with open('log_file') as f
and the error comes up when I try to call json.dumps
on a dict.
How can I track this down?
Upvotes: 0
Views: 209
Reputation: 82924
Your question: How can I track this down?
Answer:
(1) Show us the full text of the error message that you got -- without knowing what encoding that you were trying to use, we can't tell you anything. A traceback and a snippet of code that reads the file and reproduces the error would also be handy.
(2) Write a tiny Python script to find the line in the file and then do:
print repr(the_line) # Python 2.X
print ascii(the_line) # Python 3.x
and copy/paste the result into an edit of your question, so that we can see unambiguously what is in the line.
(3) It does look like random gibberish except for the ­
but do tell us whether you expect that line to be text (if so, in what human language?).
Upvotes: 1