Reputation: 6342
I wrote a search engine for BitTorrent as a learning exercise. As a node of the Mainline DHT, it sends and receives various messages that are encoded according to the bencode standard, and I am using the Go implementation by anacrolix for encoding and decoding such messages. Every minute (or less), I receive an invalid message that I expect would be encoded in bencode
. I am wondering why this happens.
Is there some BitTorrent client that is in common use that produces such commonplace invalid messages? Or, is it someone is trying to exploit some vulnerability? The latter is my best guess. Alternatively, maybe I am bad at network programming and that's the reason for all the null bytes 🫠
Here are some samples. I give the error on the first line, the hex-encoded raw binary string as the line marked "raw" and the UTF-8 interpretation (escaped like in JS) as the line marked "utf-8":
bencode: syntax error (offset: 4): unknown value type '\\x00'
(from a hosting service in Tehran, Iran)raw: 64313a61000000000000000000000000000003023b1c16
utf-8: d1:a\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0003\u0002;\u001c\u0016
bencode: syntax error (offset: 0): unknown value type 'A'
(from NordVPN in the USA)raw: 410000cd81f2a6a0000000000000000089420000
utf-8: A\u0000\u0000́ \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000 B\u0000\u0000
bencode: syntax error (offset: 0): unknown value type 'A'
(from China)raw: 4100b01bc1c14e38000000000000000039100000
utf-8: A\u0000 \u001b N8\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u00009\u0010\u0000\u0000
bencode: syntax error (offset: 4): unknown value type '\\x00'
(from Cyprus)raw: 64313a61000000000000000000000000000003fadd1c16
utf-8: d1:a\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0003 \u001c\u0016
bencode: syntax error (offset: 4): unknown value type '\\x00'
(from South Korea)raw: 64313a6100000000000000000000000000000325a41c16
utf-8: d1:a\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0003% \u001c\u0016
I have many more samples, but most of them seem to try to define a bencode dictionary, where the key is "a" (for "answer"), but then a bunch of 0x00
bytes. These messages are received over UDP, so I realize that it could just be someone sending random data to my search engine. However, it seems unlikely that random data would coincidentally form the beginning of a valid bencode
dictionary, doesn't it?
Upvotes: 0
Views: 108