Reputation: 1589
I have an audio file.
I have a bunch of [start, end] time stamp segments.
WHAT I WANT TO ACHIEVE:
Say audio is 6:00 minutes long.
Segments I have are : [[0.0,4.0], [8.0,12.0], [16.0,20.0], [24.0,28.0]]
After I pass these two to sox + python , out put should be audio that is 6 minutes long, but has audio only in the times passed by the segments.
i.e I want to pass the time stamps
and original audio to SOX + python
so that an audio with everything silenced out except for those portions corresponding to the passed segments is generated
I couldn't achieve above but came somewhat close to the opposite, after days of googling I have this:
UPDATED, MORE CONCISE CODE + EXAMPLE:
sox command that takes padding and trimming like this
SOX__SILENCE = 'sox "{inputaudio}" -c 1 "{outputaudio}" {padding}{trimming}'
Random Segments for testing:
# random segments:
A= [[0.0,16.0]]
b=[[1.0,2.0]]
z= [[1.6, 8.3], [13.2, 33.7], [35.0,38.0], [42.0,51.0], [70.2,73.7], [90.0,99.2], [123.0,131.1]]
q= [[0.0,4.0], [8.0,12.0], [16.0,20.0], [24.0,28.0]]
A small python script to generate padding and trimming.
PADDING:
def get_pad_pattern_from_timestamps(my_segments):
padding = 'pad'
for segment in my_segments:
duration = str(segment[1] - segment[0])
padding = padding + ' ' + duration + '@' + str(segment[0])
return padding
print get_pad_pattern_from_timestamps(A)
print get_pad_pattern_from_timestamps(b)
print get_pad_pattern_from_timestamps(z)
print get_pad_pattern_from_timestamps(q)
OUTPUT from ^:
pad [email protected]
pad [email protected]
pad [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
pad [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
TRIMMING:
def get_trimm_pattern_from_timestamps(my_segments):
trimming = ''
for segment in my_segments:
duration = str(segment[1] - segment[0])
trimming = trimming + ' trim 0 ' + str(segment[0]) + ' 0 ' + duration + ' ' + duration
return trimming
print get_trimm_pattern_from_timestamps(A)
print get_trimm_pattern_from_timestamps(b)
print("\n")
print get_trimm_pattern_from_timestamps(z)
print("\n")
print get_trimm_pattern_from_timestamps(q)
print("\n")
OUTPUT FROM TRIMMING:
trim 0 0.0 0 16.0 16.0
trim 0 1.0 0 1.0 1.0
trim 0 1.6 0 6.7 6.7 trim 0 13.2 0 20.5 20.5 trim 0 35.0 0 3.0 3.0 trim 0 42.0 0 9.0 9.0 trim 0 70.2 0 3.5 3.5 trim 0 90.0 0 9.2 9.2 trim 0 123.0 0 8.1 8.1
trim 0 0.0 0 4.0 4.0 trim 0 8.0 0 4.0 4.0 trim 0 16.0 0 4.0 4.0 trim 0 24.0 0 4.0 4.0 trim 0 32.0 0 4.0 4.0 trim 0 40.0 0 4.0 4.0
RUNNING SOX using about outputs from a terminal:
Padding:
sox dinners.mp3 -c 1 testlongpad.mp3 pad [email protected] [email protected] [email protected] [email protected]
Trimming:
sox dinners.mp3 -c 1 testrim.mp3 trim 0 0.0 0 16.0 16.0
Padd and trimm:
sox dinners.mp3 -c 1 testlongpadtrim.mp3 pad [email protected] [email protected] [email protected] [email protected] trim 0 0.0 0 4.0 4.0 trim 0 8.0 0 4.0 4.0 trim 0 16.0 0 4.0 4.0 trim 0 24.0 0 4.0 4.0
If S are my segments, then NS is everything else. In ^ approach I'm passing NS , and NS is getting removed from Audio.
What I want to achieve is still the same but in a different way i.e I want to pass S
so that only portions of audio corresponding toS
are retained.
PS: My question is very specific, i am new to audio processing and unsure how to proceed. Kindly don't close question as being too broad or something. I'd be happy to provide more details to provide clarification. Lastly this is not a hw question. This is for a personal project.
Sample Audio : https://www.dropbox.com/s/1p27nfwney42ka2/LAZY_SALON_-03-_Hot_Dinners.mp3?dl=0
Sample Segments[[start,end],[,] ] : [[1.6, 8.3], [13.2, 33.7], [35.0,38.0], [42.0,51.0], [70.2,73.7], [90.0,99.2], [123.0,131.1]]
So when these time stamps are passed to sox/python with audio, everything in the audio except those portions in the supplied segments should be silenced out.
Upvotes: 4
Views: 1239
Reputation: 47099
I would probably solve this with a zsh
script and awk
.
If the times are given like this:
bits
1.6 8.3
13.2 33.7
35.0 38.0
42.0 51.0
70.2 73.7
90.0 99.2
123.0 131.1
Calculate the silence bits like this:
awk '{ print $1, $2, $1 - p; p = $2 }' bits
Output:
1.6 8.3 1.6
13.2 33.7 4.9
35.0 38.0 1.3
42.0 51.0 4
70.2 73.7 19.2
90.0 99.2 16.3
123.0 131.1 23.8
You are now be able to generate the desired command-line with something like this:
args="sox "
m=file.mp3
awk '{ print $1, $2, $1 - p; p = $2 }' bits |
while read s e n; do
args+="\"|sox -n -p trim 0 $n\" "
args+="\"|sox $m -p trim $s =$e remix 1\" "
done
args+="out.wav"
echo "$args"
Pipe it into /bin/sh
to execute:
... | sh
The output from sox
should now be in out.wav.
Upvotes: 1
Reputation: 1589
I was able to implement with a workaround.
See : create new list from list of lists in python by grouping
What I did was create a new list containing the regions between segments and then pass it on to sox. At the moment whatever I pass to sox gets removed. So I calculated regions to be removed and then passed it on to sox. It worked pretty well.
Solution is still inverted , but I don't have to change anything in the sox.
I won't accept my answer as an answer. Hoping someone is able to come up with a solution which involves modifying sox commands and not have to recalculate segments like I did.
Upvotes: 1