Reputation: 2386
I'm trying to implement my own cyclic redundancy check (CRC) in Python. The layout of my program is as follows:
crc16
.corrupt_data
on the generated message.==
).I am confident that the methods crc16
and corrupt_data
are correct, so I don't think there's much reason to analyse them too closely. I think the problems start with my use of byte strings in the second half of my program, after those two methods.
My code is as follows:
from random import random
from random import choice
from string import ascii_uppercase
CORRUPTION_RATE = 0.25
def crc16(data: bytes):
xor_in = 0x0000 # initial value
xor_out = 0x0000 # final XOR value
poly = 0x8005 # generator polinom (normal form)
reg = xor_in
for octet in data:
# reflect in
for i in range(8):
topbit = reg & 0x8000
if octet & (0x80 >> i):
topbit ^= 0x8000
reg <<= 1
if topbit:
reg ^= poly
reg &= 0xFFFF
# reflect out
return reg ^ xor_out
from random import randbytes
def corrupt_data(data : bytes):
'''
some random corruption of byte data
can be modified as needed using the CORRUPTION_RATE global constant
'''
temp = data[:]
while True:
location = int(len(temp) * random())
data_list = list(temp)
if random() < 0.5:
data_list[location] = (data_list[location] + 1) % 256
else:
data_list[location] = (data_list[location] - 1) % 256
temp = bytes(data_list)
if random() < CORRUPTION_RATE and temp != data:
break
return temp
# Generate random byte message of length n
def random_message(n):
randomBytes = ''.join(choice(ascii_uppercase) for i in range(n)).encode()
print("randomBytes is " + str(randomBytes))
print("The class type of randomBytes is " + str(type(randomBytes)))
return randomBytes
numberOfErrors = 0;
for i in range(10000):
# generating random byte message of length n
randomMessage = random_message(5)
# generating the checksum value using the CRC code
checksumValue = crc16(randomMessage)
#print("checksumValue is " + str(checksumValue))
#print("The class type of checksumValue is " + str(type(checksumValue)))
# running the corruption on the generated message
#print("The class type of bchecksumValue is " + str(type(b"checksumValue")))
corrupt = corrupt_data(b"checksumValue")
#print("The class type of corrupt_data(bchecksumValue) is " + str(type(corrupt)))
#print("Checking whether the checksum is different ... ")
different = (b"checksumValue" == corrupt)
#print("bchecksumValue == corrupt is " + str(different))
#print("bchecksumValue was " + str(b"checksumValue") + ", and corrupt was " + str(corrupt))
if(different == False):
numberOfErrors += 1
print("numberOfErrors is " + str(numberOfErrors))
As you can see, I've inserted various (now commented out) print statements to help me with debugging.
The problem is that, when I run the above code, I get that numberOfErrors is 10000
. Obviously, this can't be correct, since we expect some of them to be correct, and so we expect numberOfErrors
to be somewhat less than 10000.
As I said, I am confident that the crc16
and corrupt_data
functions are correct, and I suspect that the problem is arising somewhere in my use of the byte strings inside the for loop:
numberOfErrors = 0;
for i in range(10000):
# generating random byte message of length n
randomMessage = random_message(5)
# generating the checksum value using the CRC code
checksumValue = crc16(randomMessage)
#print("checksumValue is " + str(checksumValue))
#print("The class type of checksumValue is " + str(type(checksumValue)))
# running the corruption on the generated message
#print("The class type of bchecksumValue is " + str(type(b"checksumValue")))
corrupt = corrupt_data(b"checksumValue")
#print("The class type of corrupt_data(bchecksumValue) is " + str(type(corrupt)))
#print("Checking whether the checksum is different ... ")
different = (b"checksumValue" == corrupt)
#print("bchecksumValue == corrupt is " + str(different))
#print("bchecksumValue was " + str(b"checksumValue") + ", and corrupt was " + str(corrupt))
if(different == False):
numberOfErrors += 1
print("numberOfErrors is " + str(numberOfErrors))
I've never really programmed with bytes / byte strings, and I've also only just recently started learning Python, so I don't understand what I'm doing incorrectly. Where's my error, and how do I fix it?
As mentioned by user2357112 supports Monica in the comments, the problem might be b"checksumValue"
in corrupt = corrupt_data(b"checksumValue")
. The problem I had was that the function crc16
returns an int, so, in order to convert it back into bytes for passing into the function corrupt_data(data : bytes)
, I tried using the b
prefix. I guess this is my Python inexperience showing.
Ok, so I'm trying the solution offered in this answer. The modified code is as follows:
# running the corruption on the generated message
bs = str(checksumValue).encode('ascii')
print("str(checksumValue).encode('ascii') is " + str(bs))
#print("The class type of bchecksumValue is " + str(type(b"checksumValue")))
print("The class type of str(checksumValue).encode('ascii') is " + str(type(bs)))
#corrupt = corrupt_data(b"checksumValue")
corrupt = corrupt_data(bs)
#print("The class type of corrupt_data(bchecksumValue) is " + str(type(corrupt)))
print("The class type of corrupt_data(bs) is " + str(type(corrupt)))
The output is
randomBytes is b'BBVFC'
The class type of randomBytes is <class 'bytes'>
checksumValue is 10073
The class type of checksumValue is <class 'int'>
str(checksumValue).encode('ascii') is b'10073'
The class type of str(checksumValue).encode('ascii') is <class 'bytes'>
The class type of corrupt_data(bs) is <class 'bytes'>
So the classes seem to match with what we'd expect.
Implementing the changes in EDIT2 in the for loop, I still get numberOfErrors is 10000
as my output. The code is as follows:
numberOfErrors = 0;
for i in range(10000):
# generating random byte message of length n
randomMessage = random_message(5)
# generating the checksum value using the CRC code
checksumValue = crc16(randomMessage)
#print("checksumValue is " + str(checksumValue))
#print("The class type of checksumValue is " + str(type(checksumValue)))
# running the corruption on the generated message
bs = str(checksumValue).encode('ascii')
#print("str(checksumValue).encode('ascii') is " + str(bs))
#print("The class type of str(checksumValue).encode('ascii') is " + str(type(bs)))
corrupt = corrupt_data(bs)
#print("The class type of corrupt_data(bs) is " + str(type(corrupt)))
#print("Checking whether the checksum is different ... ")
different = (bs == corrupt)
#print("bs == corrupt is " + str(different))
#print("bs was " + str(bs) + ", and corrupt was " + str(corrupt))
if(different == False):
numberOfErrors += 1
print("numberOfErrors is " + str(numberOfErrors))
Upvotes: 1
Views: 191
Reputation: 104682
Your issue is not with the byte strings really, it's a logical error. You're trying to corrupt the wrong thing. You don't want to corrupt the checksum, you want to corrupt the original message and then take a checksum of the corrupted version. Then you can compare if the two checksums match or not.
Try:
undetected_errors = 0
for i in range(10000):
good_message = random_message(5)
good_checksum = crc16(good_message)
corrupted_message = corrupt_data(good_message)
corrupted_checksum = crc16(corrupted_message)
if good_checksum == corrupted_checksum:
undetected_errors += 1
Upvotes: 1