fabda01
fabda01

Reputation: 3763

Concatenate / Split two bytes using unique character

I am encoding two images into bytes then concatenating the bytes into one bytes blob to send it via socket communication.

At server side, I want to split this blob back into two bytes sequences in order to decode it back to its original form.

Since images do not contain commas, I am concatenating the bytes using , character, but when I try to split the bytes back, it returns multiple chunks of bytes.

This is what I have so far

import requests
from io import BytesIO

url = "https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png"

img = BytesIO(requests.get(url).content).read()
data = b",".join([img, img])
img1, img2 = data.split(b",")

In my example, the data is split into 3646 sub-sequences of bytes.

  1. What is wrong with my code?
  2. What is the fix for it?

Upvotes: 1

Views: 421

Answers (2)

Daweo
Daweo

Reputation: 36520

Since images do not contain commas, I am concatenating the bytes using , character, but when I try to split the bytes back, it returns multiple chunks of bytes.

Image file might generally contain any byte, including , (\x2C). If you wish to use concatenating and splitting you have to process image so it does never contain certain byte. You might do it using built-in python module base64. base64.b64encode function accept bytes and return bytes encoded using base64 algorithm which contain solely bytes listed in Table 1: The Base 64 Alphabet of RFC 3548, so you might simply join such-coded images using any character not present in said table. After receiving split using character you have chosen and then feed elements into base64.b64decode function to get original bytes. Keep in mind that this method will increase size of send messages.

Upvotes: 2

thebjorn
thebjorn

Reputation: 27321

As explained in the comment, b',' is a byte with the integer value 44 - which is likely to occur in random byte data (or image data).

You can encode random bytes in many different ways, but most of them will include storing the length of the data.

E.g. if you store both the number of images, and for each image first store the number of bytes in the image, you get something like this:

def join_images(*images):
    res = b'%d:' % len(images)
    for img in images:
        res += b'%d:%s;' % (len(img), img)
    return res

where the resulting byte string will look something like 2:3453:...3453 bytes of binary...data;123:...123 bytes of binary data;.

To extract the images, you need to parse the format you've created, e.g.

def split_images(data):
    count, data = data.split(b':', 1)     # extract the number of images
    count = int(count)
    images = []
    for i in range(count):
        size, data = data.split(b':', 1)  # extract the size of this image
        size = int(size)
        images.append(data[:size])        # grab size bytes from the data
        data = data[size+1:]              # make data point to the next image (+1 for the semicolon)
    return images

Upvotes: 1

Related Questions