How to generate random numbers from a given range with provided distribution probability

Question

Suppose I have a list of files and given probability (larger number indicates higher probability)

fileA (8)
fileB (1)
fileC (3)
fileD (4)
...

How can I generate a random sequence to simulate the relative probability, just like shuf tool does.

The length of the sequence might be shorter than the number of file set. This should be part of the input to a shell function, so any lightweight solution (using traditional Unix tools) would be preferred, while make use of heavy libraries or platforms (like Matlab) is not good.

John1024 · Accepted Answer

To select a file randomly with relative probabilities given by:

$ cat file
fileA (8)
fileB (1)
fileC (3)
fileD (4)

Use this:

$ awk -F'[ ()]' '{for (i=1;i<=$(NF-1);i++) print $1}' file |shuf | head -n1
fileD

How to generate random numbers from a given range with provided distribution probability

Answers (2)

Related Questions