Reputation: 1506
I'm trying to write a Python script for processing audio data stored on S3.
I have an S3 object which I'm calling using
def grabAudio(filename, directory):
obj = s3client.get_object(Bucket=bucketname, Key=directory+'/'+filename)
return obj['Body'].read()
Accessing the data using
print(obj['Body'].read())
yields the correct audio information. So its accessing the data from the bucket just fine.
When I try to then use this data in my audio processing library (pydub), it fails:
audio = AudioSegment.from_wav(grabAudio(filename, bucketname))
Traceback (most recent call last):
File "split_audio.py", line 38, in <module>
audio = AudioSegment.from_wav(grabAudio(filename, bucketname))
File "C:\Users\jmk_m\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydub\audio_segment.py", line 544, in from_wav
return cls.from_file(file, 'wav', parameters)
File "C:\Users\jmk_m\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydub\audio_segment.py", line 456, in from_file
file.seek(0)
AttributeError: 'bytes' object has no attribute 'seek'
What is the format of the object coming in from s3? Byte array I presume? If so, is there a way of parsing it into a .wav format without having to save to disk? I'm trying to refrain from saving to disk.
Also open to alternative audio processing libraries.
Upvotes: 3
Views: 3271
Reputation: 1506
Thanks to Linas for linking a similar issue, and Jiaaro for the answer.
import io
s = io.BytesIO(y['data'])
AudioSegment.from_file(s).export(x, format='mp3')
Allows me to pull directly from the bucket into memory with
obj = s3client.get_object(Bucket=bucketname, Key=customername+'/'+filename)
data = io.BytesIO(obj['Body'].read())
audio = AudioSegment.from_file(data)
Upvotes: 3