Reputation: 801
I am not able to reproduce the exact output of the content
-property of the dcc.Upload component.
If I upload the file my_excel.xlsx
to the dcc.Upload component, my callback-function receives a "base64 encoded string" (according to the dcc.Upload documentation). I don’t know how to reproduce the exact same string without the dcc.Upload component (I want to use the Output for Unit Tests).
my current approach:
import base64
import io
import pandas as pd
# This is what I try to reproduce the output of the dcc.Upload Component
with open('tests/data/my_excel.xlsx', 'rb') as file:
raw_data = file.read()
# raw_data is the output I receive from the dcc.Upload Component
# these steps are raise no Error with the output of dcc.Upload
_, content_string = raw_data.split(',') # this Fails
decoded = base64.b64decode(content_string)
df = pd.read_excel(io.BytesIO(decoded))
I get the error TypeError: a bytes-like object is required, not 'str'
.
if I add
raw_data = base64.b64encode(raw_data)
before the raw_data.split(',')
, I get the same error.
How do I get the exact same "base64 encoded string" without the dcc.Upload Component?
Upvotes: 0
Views: 1895
Reputation: 76
I could not find a single function to reproduce the contents
property of dcc.Upload, but was able to manually create the output of dcc.Upload.
From the documentation we have:
contents
is a base64 encoded string that contains the files contents [...] Propertyaccept
(string; optional): Allow specific types of files. See https://github.com/okonet/attr-accept for more information. Keep in mind that mime type determination is not reliable across platforms. CSV files, for example, are reported as text/plain under macOS but as application/vnd.ms-excel under Windows. In some cases there might not be a mime type set at all.
Inspecting the contents
-string reaveals, that it is composed of two strings:
content_type, content_string = contents.split(',')
Inspecting further shows:
content_type
: contains mime type information of file
content_string
: the base64 encoded content of the file
import base64
import io
import pandas as pd
import magic
filepath = 'tests/data/my_excel.xlsx'
# Reproduce output of dcc.Upload Component
with open(filepath, "rb") as file:
decoded = file.read()
content_bytes = base64.b64encode(decoded)
content_string = content_bytes.decode("utf-8")
mime = magic.Magic(mime=True)
mime_type = mime.from_file(filepath)
content_type = "".join(["data:", mime_type, ";base64"])
contents = "".join([content_type, ",", content_string])
# and now revert: convert contents to binary file stream
content_type, content_string = contents.split(",")
decoded = base64.b64decode(content_string)
df = pd.read_excel(io.BytesIO(decoded))
Upvotes: 4