Reputation: 6768
In DNS messages, domain names are represented like so:
Address Value
----------------------------------------------------------------
0x00C 0x5 s t o r e 0x7 e x a m p l e 0x3 c o m 0x0
0x023 0x4 b l o g 0xC012
At address 0x00C
we have store.example.com
and at address 0x023
we have blog.example.com
. 0xC012
represents a pointer to 0x012
where example.com
begins in the first domain name, so we don't have to repeat it.
I managed to turn a domain name into an array of labels using the following code:
labels = []
for part in self.domain.split('.'):
label = part.encode('ascii')
length = len(label).to_bytes(1, 'big')
labels.append(length + label)
This gives me the following array:
[b'\x04blog', b'\x07example', b'\x03com']
But I have trouble building a regular expression or an efficient function that would search the part of the message that's already written for either \x04blog\x07example\x03com\x00
or \x07example\x03com\x00
or \x03com00
. I'd like to find the longest match I can. I tried building a regex like this:
(((\x04blog)?\x07example)?\x03com)?\x00
But it seems the regex engine prefers just matching for \x00
in that case. How do I fix that?
There's also another issue: we could technically add my.blog.example.com
which would need to contain 0x2 m y 0xC023
with 0xC023
pointing to blog.example.com
.
Upvotes: 2
Views: 1926
Reputation: 12515
First sorry for the false hint I gave you, there are far simpler things to do. In fact there are far simpler things to do than using a regex here. regex are far too powerful to do this, you can achieve the same result without.
DNS message format is defined in RFC 1035 : "DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION"
Name compression is explained in section 4.1.4. Note also that you have to take into account the case insensitive aspect, as outlined in section 2.3.3
It should be the case in the following code that can surely be optimized, with example names taken from RFC:
#!/usr/bin/python3
names=['F.ISI.ARPA', 'FOO.F.ISI.ARPA', 'ARPA']
cache={}
pos=0
results=[]
for name in names:
labels = name.split('.')
while len(labels):
key = '.'.join(labels)
if key.lower() in cache:
results.append((cache[key.lower()] + 2**15 + 2**14).to_bytes(2, 'big'))
pos += 2
break
else:
label = labels.pop(0).encode('ascii')
length = len(label).to_bytes(1, 'big')
results.append(length + label)
cache[key.lower()] = pos
pos += len(label) + 1
else:
results.append(b'0')
pos += 1
print(results)
which produces:
[b'\x01F', b'\x03ISI', b'\x04ARPA', b'0', b'\x03FOO', b'\xc0\x00', b'\xc0\x06']
If instead using your example:
names=['store.example.com', 'blog.example.com', 'my.blog.example.com']
it produces:
[b'\x05store', b'\x07example', b'\x03com', b'0', b'\x04blog', b'\xc0\x06', b'\x02my', b'\xc0\x13']
Upvotes: 2