Kevin mendieta perez
Kevin mendieta perez

Reputation: 43

Some way to covert the string representation of a pdf into bytes in python

i'm actually trying to do something that i do not know if its ok.

Problem:

I have a web client and a web server, the server (written in python with flask) processes a pdf file in order to get some data, and the client just send the pdf file and waits for the response. The think is that the client can send various pdf files to process and what i want to do is, to send all the pdfs from the client to the server in just one request.

What I have planned to do:

I was thinking on convert the Blob of each pdf in a String and send a POST Request with a JSON body like this:

BODY:
  {
    "content":[
        {"name": "pdf_name_1.pdf", "data": "some blob data converted to string"},
        {"name": "pdf_name_2.pdf", "data": "some blob data converted to string"},
        {"name": "pdf_name_3.pdf", "data": "some blob data converted to string"},
        ...
    ]
}

So then in the server i was thinking to convert again the data into a blob(bytes) in order to write down the pdf a start the processing the data.

My question:

Is there any way to convert the str representation of the pdf to bytes in order to write down in disk the pdf with python?

Thanks a lot, if some one come up with another idea to send bunch of pdfs in only one request let me know please.

pd: I'm using python 3.5 and Flask for the web server.

Upvotes: 4

Views: 9685

Answers (1)

Federico Rubbi
Federico Rubbi

Reputation: 734

In such cases, it's preferred to send file data passing that with the files keyword, like so:

import requests


def send_pdf_data(filename_list, encoded_pdf_data):
    files = {}

    for (filename, encoded, index) in zip(filename_list, encoded_pdf_data, range(len(filename_list))):
        files[f"pdf_name_[index].pdf"] = (filename, open(filename, 'rb'), 'application/pdf')

    data = {}
    # *Put whatever you want in data dict*

    requests.post("http://yourserveradders", data=data, files=files)


def main():
    filename_list = ["pdf_name_1.pdf", "pdf_name_2.pdf"]
    pdf_blob_data = [open(filename, 'wb').read() for filename
                     in filename_list]

if __name__ == '__main__':
    main()

However, if you really want to pass data as json, you should use base-64 module as @Mark Ransom mentioned.

You can implement it in this way:

import requests
import json
import base64


def encode(data: bytes):
    """
    Return base-64 encoded value of binary data.
    """
    return base64.b64encode(data)


def decode(data: str):
    """
    Return decoded value of a base-64 encoded string.
    """
    return base64.b64decode(data.encode())


def get_pdf_data(filename):
    """
    Open pdf file in binary mode,
    return a string encoded in base-64.
    """
    with open(filename, 'rb') as file:
        return encode(file.read())


def send_pdf_data(filename_list, encoded_pdf_data):
    data = {}
    # *Put whatever you want in data dict*
    # Create content dict.
    content = [dict([("name", filename), ("data", pdf_data)])
               for (filename, data) in zip(filename_list, encoded_pdf_data)]
    data["content"] = content

    data = json.dumps(data) # Convert it to json.
    requests.post("http://yourserveradders", data=data)


def main():
    filename_list = ["pdf_name_1.pdf", "pdf_name_2.pdf"]
    pdf_blob_data = [get_pdf_data(filename) for filename
                     in filename_list]

if __name__ == '__main__':
    main()

Upvotes: 1

Related Questions