How can combine few base64 audio chunks (from microphone)

Question

I get base64 chunks from microphone.

I need to concatenate them and send to Google API as one base64 string for speech recognition. Roughly speaking, in the first chunk the word Hello is encoded, and in the second world!. I need to glue two chunks, send them to google api of one line and receive Hello world! in response

You can see Google Speech-to-Text as example. Google also sends data from the microphone in base64 string using websockets (see Network).

Unfortunately, I don't have a microphone at hand - I can't check it. And we must do it now.

Suppose I get

chunk1 = "TgvsdUvK ...."
chunk2 = "UZZxgh5V ...."

Do I understand correctly that it will be enough just

base64.b64encode (chunk1 + chunk2))

Or do you need to know something else? Unfortunately, everything depends on the lack of a microphone (

Random Davis · Accepted Answer

Your example of encoding chunk1 + chunk2 wouldn't work, since base64 strings have padding at the end. If you just concatenated two base64 strings together, they couldn't be decoded.

For example, the strings StringA and StringB, when their ascii or utf-8 representations are encoded in base64, are the following: U3RyaW5nQQ== and U3RyaW5nQg==. Each one of those can be decoded fine. But, if you concatenated them, your result would be U3RyaW5nQQ==U3RyaW5nQg==, which is invalid:

concatenated_b64_strings = 'U3RyaW5nQQ==U3RyaW5nQg=='
concatenated_b64_strings_bytes = concatenated_b64_strings.encode('ascii')
decoded_strings = base64.b64decode(concatenated_b64_strings_bytes)
print(decoded_strings.decode('ascii')) # just outputs 'StringA', which is incorrect

So, in order to take those two strings (which I'm using as an example in place of binary data) and concatenate them together, starting with only their base64 representations, you have to decode them:

import base64

string1_base64 = 'U3RyaW5nQQ=='
string2_base64 = 'U3RyaW5nQg=='

# need to convert the strings to bytes first in order to decode them
base64_string1_bytes = string1_base64.encode('ascii')
base64_string2_bytes = string2_base64.encode('ascii')

# now, decode them into the actual bytes the base64 represents
base64_string1_bytes_decoded = base64.decodebytes(base64_string1_bytes)
base64_string2_bytes_decoded = base64.decodebytes(base64_string2_bytes)

# combine the bytes together
combined_bytes = base64_string1_bytes_decoded + base64_string2_bytes_decoded

# now, encode these bytes as base64
combined_bytes_base64 = base64.encodebytes(combined_bytes)

# finally, decode these bytes so you're left with a base64 string:
combined_bytes_base64_string = combined_bytes_base64.decode('ascii')
print(combined_bytes_base64_string) # output: U3RyaW5nQVN0cmluZ0I=

# let's prove that it concatenated successfully (you wouldn't do this in your actual code)
base64_combinedstring_bytes = combined_bytes_base64_string.encode('ascii')
base64_combinedstring_bytes_decoded_bytes = base64.decodebytes(base64_combinedstring_bytes)
base64_combinedstring_bytes_decoded_string = base64_combinedstring_bytes_decoded_bytes.decode('ascii')
print(base64_combinedstring_bytes_decoded_string) # output: StringAStringB

In your case, you'd be combining more than just two input base64 strings, but the process is the same. Take all the strings, encode each one to ascii bytes, decode them via base64.decodebytes(), and then add them all together via the += operator:

import base64

input_strings = ['U3RyaW5nQQ==', 'U3RyaW5nQg==']
input_strings_bytes = [input_string.encode('ascii') for input_string in input_strings]
input_strings_bytes_decoded = [base64.decodebytes(input_string_bytes) for input_string_bytes in input_strings_bytes]
combined_bytes = bytes()
for decoded in input_strings_bytes_decoded:
    combined_bytes += decoded
combined_bytes_base64 = base64.encodebytes(combined_bytes)
combined_bytes_base64_string = combined_bytes_base64.decode('ascii')
print(combined_bytes_base64_string) # output: U3RyaW5nQVN0cmluZ0I=

How can combine few base64 audio chunks (from microphone)

Answers (1)

Related Questions