Reputation: 1498
Suppose I have a list of files and given probability (larger number indicates higher probability)
How can I generate a random sequence to simulate the relative probability, just like shuf
tool does.
The length of the sequence might be shorter than the number of file set. This should be part of the input to a shell function, so any lightweight solution (using traditional Unix tools) would be preferred, while make use of heavy libraries or platforms (like Matlab) is not good.
Upvotes: 1
Views: 38
Reputation: 67507
awk
to the rescue!
$ awk -v n=10 '{k=a[NR-1]+$2; a[NR]=k; v[k]=$1}
END{srand();
for(j=1;j<=n;j++)
{r=int(rand()*a[NR])+1;
for(i=1;i<=NR;i++)
if(r<=a[i]) {print v[a[i]]; break}}}' weights
$ cat weights
fileA 8
fileB 1
fileC 3
fileD 4
usage, creates 10 random samples based on relative weights
$ awk -v n=10 '...' weights
fileA
fileA
fileA
fileA
fileA
fileA
fileA
fileD
fileD
fileA
Upvotes: 1
Reputation: 113914
To select a file randomly with relative probabilities given by:
$ cat file
fileA (8)
fileB (1)
fileC (3)
fileD (4)
Use this:
$ awk -F'[ ()]' '{for (i=1;i<=$(NF-1);i++) print $1}' file |shuf | head -n1
fileD
Upvotes: 1