Reputation: 584
I have audio from a video that I've loaded with PyTorch. Given a starting index and ending index corresponding to the video segment of interest, along with the video FPS and audio sampling rate, how would I go about extracting the slice of audio that matches the segment of interest of the video?
My intuition is to convert frames to time via:
start_time = frame_start / fps
end_time = frame_end / fps
the convert time to sample position with:
start_sample = int(math.floor(start_time * sr))
end_sample = int(math.floor(end_time * sr))
Is this correct? Or is there something I'm missing? I'm worried that there will be loss of information since I'm converting the samples into ints with floor.
Upvotes: 0
Views: 1018
Reputation: 4148
Let's say you have
fs = 44100 # audio sampling frequency
vfr = 24 # video frame rate
frame_start = 10 # index of first frame
frame_end = 10 # index of last frame
audio = np.arange(44100) # audio in form of ndarray
you can calculate at which points in time you want to slice the audio
time_start = frame_start / vfr
time_end = frame_end / vfr # or (frame_end + 1) / vfr for inclusive cut
and then to which samples those points in time correspond:
sample_start_idx = int(time_start * fs)
sample_end_idx = int(time_end * fs)
Its up to you if you want to be super-precise and take into account the fact that audio corresponding to a given frame should rather be starting half a frame before a frame and end half a frame after. In such a case use:
time_start = np.clip((frame_start - 0.5) / vfr, 0, np.inf)
time_end = (frame_end + 0.5) / vfr
Upvotes: 1
Reputation: 153
Your solution is just fine. Assuming your sample rate is 16000, the flooring will cause a video/audio desynch on the order of 4.166e-05 seconds, which is orders of magnitude below what human ears are able to discern.
import math
fps = 60
frame_start = 121
frame_end = 181
sr=16000
start_time = frame_start / fps
end_time = frame_end / fps
start_sample = int(math.floor(start_time * sr))
end_sample = int(math.floor(end_time * sr))
print(end_time-end_sample/sr) # 4.166666666671759e-05
Upvotes: 1