Randomly select timeframes as tuples from a list of time points

Question

I have several time points taken from a video with some max time length (T). These points are stored in a list of lists as follows:

time_pt_nested_list = 
[[0.0, 6.131, 32.892, 43.424, 46.969, 108.493, 142.69, 197.025, 205.793, 244.582, 248.913, 251.518, 258.798, 264.021, 330.02, 428.965],
 [11.066, 35.73, 64.784, 151.31, 289.03, 306.285, 328.7, 408.274, 413.64],
 [48.447, 229.74, 293.19, 333.343, 404.194, 418.575],
 [66.37, 242.16, 356.96, 424.967],
 [78.711, 358.789, 403.346],
 [84.454, 373.593, 422.384],
 [102.734, 394.58],
 [158.534],
 [210.112],
 [247.61],
 [340.02],
 [365.146],
 [372.153]]

Each list above is associated with some probability; I'd like to randomly select points from each list according to its probability to form n tuples of contiguous time spans, such as the following:

[(0,t1),(t1,t2),(t2,t3),...,(tn,T)]

where n is specified by the user. All the returned tuples should only contain the floating point numbers inside the nested list above. I want to assign the highest probability to them to be sampled and appear in the returned tuples, the second list a slightly lower probability, etc. The exact details of these probabilities are not important, but it would be nice if the user can input a parameter that controls how fast the probability decays when idx increases.

The returned tuples are timeframes that should exactly cover the entire video and should not overlap. 0 and T may not necessarily appear in time_pt_nested_list (but they may). Are there nice ways to implement this? I would be grateful for any insightful suggestions.

For example if the user inputs 6 as the number of subclips, then this will be an example output:

[(0.0, 32.892), (32.892, 64.784), (64.784, 229.74), (229.74, 306.285), (306.285, 418.575), (418.575, 437.47)]

All numbers appearing in the tuples appeared in time_pt_nested_list, except 0.0 and 437.47. (Well 0.0 does appear here but may not in other cases) Here 437.47 is the length of video which is also given and may not appear in the list.

anon01 · Accepted Answer

This is simpler than it may look. You really just need to sample n points from your sublists, each with row-dependent sample probability. Whatever samples are obtained can be time-ordered to construct your tuples.

import numpy as np

# user params
n = 6
prob_falloff_param = 0.2

lin_list = sorted([(idx, el) for idx, row in enumerate(time_pt_nested_list) for 
el in row], key=lambda x: x[1])

# endpoints required, excluded from random selection process
t0 = lin_list.pop(0)[1]
T = lin_list.pop(-1)[1]
arr = np.array(lin_list)

# define row weights, alpha is parameter
weights =  np.exp(-prob_falloff_param*arr[:,0]**2)
norm_weights = weights/np.sum(weights)

# choose (weighted) random points, create tuple list:
random_points = sorted(np.random.choice(arr[:,1], size=(n-1), replace=False))

time_arr = [t0, *random_points, T]
output = list(zip(time_arr, time_arr[1:]))

example outputs:

# n = 6 
[(0.0, 78.711),
 (78.711, 84.454),
 (84.454, 158.534),
 (158.534, 210.112),
 (210.112, 372.153),
 (372.153, 428.965)]

# n = 12
[(0.0, 6.131),
 (6.131, 43.424),
 (43.424, 64.784),
 (64.784, 84.454),
 (84.454, 102.734),
 (102.734, 210.112),
 (210.112, 229.74),
 (229.74, 244.582),
 (244.582, 264.021),
 (264.021, 372.153),
 (372.153, 424.967),
 (424.967, 428.965)]

Randomly select timeframes as tuples from a list of time points

Answers (1)

Related Questions