Reputation: 676
In a bash script I have 3 keywords that I want to assign each a random value, which when added up total 100.
After every run of the script, each keyword should always get random values, but in the end still add up to 100.
So like this:
A=10
B=70
C=20
and the second run:
A=35
B=57
C=8
and so on.
I know I can use RANDOM directly from bash, or shuf, but I cannot wrap my head around always getting values that eventually add up to 100.
Upvotes: 1
Views: 448
Reputation: 203684
$ cat tst.sh
#!/bin/env bash
genNums() {
local cnt="$1" tot="$2" random
# Print "cnt" evenly distributed random numbers that when summed
# total the value in "tot".
# The awk part generates 3 random numbers but the distribution
# of those numbers is not even. The first number generated is
# in the range min-max but the second number is always in a
# more restricted range except in that 1/(max-min) case where
# that the first number was zero, and so on.
# The shuf randomizes the order those numbers are output thereby
# creating an even distribution of the numbers. If you dont have
# shuf or just prefer an all awk solution then within the call to
# awk you can save the generated numbers in an indexed array and
# then use a Knuth Shuffle to output them randomly, see
# https://stackoverflow.com/a/27386400/1745001.
# Without a seed for srand(), it can produce the same output on
# multiple calls within the same second as srand() can use the current
# seconds since the epoch as the seed value. We dont want to just
# use $RANDOM for the seed as its range is just 0-32767 and so
# would lead to repetitions every 32768 or less calls of the script.
# "-N4 -tu4" leads to random numbers output, changing 4 to 8 causes a
# lot of repetition, presumably due to some truncation in srand().
# Could alternatively use random=$(( $(date '+%s') + $RANDOM )).
random=$(od -An -N4 -tu4 < /dev/urandom)
awk -v cnt="$cnt" -v max="$tot" -v seed="$random" '
BEGIN {
srand(seed)
for (i=1; i<cnt; i++) {
val = int( rand() * (max+1) )
print val
max = max - val
}
val = max - min
print val
}
' |
shuf
}
readarray -t arr < <(genNums 3 100)
printf 'A=%s\n' "${arr[0]}"
printf 'B=%s\n' "${arr[1]}"
printf 'C=%s\n' "${arr[2]}"
Here's some individual runs to show random numbers being generated:
$ ./tst.sh
A=10
B=58
C=32
$ ./tst.sh
A=17
B=56
C=27
$ ./tst.sh
A=28
B=33
C=39
and here's the averages of 10000 runs to show a random distribution:
$ for ((i=1; i<=10000; i++)); do ./tst.sh; done |
awk 'BEGIN{FS=OFS="="} {s[$1]+=$2} END{for (i in s) print i, s[i]/(NR/length(s))}'
A=33.8476
B=33.6365
C=33.5159
Hmm. I just tried running that again but this time printing out the distribution of 0-100 values output by doing:
$ for ((i=1; i<=10000; i++)); do ./tst.sh; done |
awk 'BEGIN{FS=OFS="="} {s[$1]+=$2; n[$2]++}
END{
for (i in s) print i, s[i]/(NR/length(s));
print "---";
for (i=0;i<=100;i++) print i, n[i]+0
}'
and I got this:
0=1112
1=927
2=811
3=811
4=732
5=689
6=615
7=589
8=586
9=566
10=548
11=495
12=528
13=506
14=488
15=470
16=448
17=488
18=466
19=423
20=406
21=388
22=384
23=398
24=381
25=381
26=370
27=366
28=353
29=351
30=339
31=363
32=315
33=312
34=315
35=293
36=270
37=264
38=273
39=280
40=290
41=269
42=268
43=281
44=260
45=283
46=303
47=251
48=248
49=225
50=261
51=243
52=239
53=236
54=221
55=217
56=229
57=184
58=171
59=193
60=183
61=230
62=198
63=190
64=183
65=190
66=195
67=178
68=203
69=162
70=160
71=157
72=141
73=152
74=148
75=158
76=164
77=170
78=145
79=135
80=145
81=156
82=135
83=127
84=136
85=131
86=118
87=140
88=131
89=117
90=116
91=128
92=105
93=112
94=127
95=110
96=114
97=112
98=111
99=117
100=95
which indicates you'll get numbers at the lower and of the range more frequently than at the higher end. Not sure why...
I just figured it out and the output I'm getting is to be expected since in a batch of 3 numbers that add up to 100 the 1st number can be from 0-100 but the 2nd will almost always be in a smaller range (and so have a smaller value) and the 3rd in an even smaller range than that. Hence the skew towards smaller numbers since 2 out of every 3 numbers generated in every iteration will be in a smaller range than 0-100.
Upvotes: 3
Reputation: 19585
You can do it as suggested earlier:
#!/usr/bin/env bash
for _ in {1..10};do
((A=RANDOM%100, B=RANDOM%(100-A), C=100-A-B))
echo A=$A B=$B C=$C A+B+C=$((A+B+C))
done
But distribution is not even:
Lets demonstrate the unevenness:
#!/usr/bin/env bash
declare -i NA=0 NB=0 NC=0
declare -i SA=0 SB=0 SC=0
declare -- AA="" AB="" AC=""
for _ in {1..1000000};do
((A = RANDOM % 100, B = RANDOM % (100 - A), C = 100 - A - B))
((SA += A, NA += 1))
((SB += B, NB += 1))
((SC += C, NC += 1))
done
AA="$(bc <<<"scale=12;$SA/$NA")"
AB="$(bc <<<"scale=12;$SB/$NB")"
AC="$(bc <<<"scale=12;$SC/$NC")"
LC_NUMERIC=POSIX
printf 'Average A=%.2f\n' "$AA"
printf 'Average B=%.2f\n' "$AB"
printf 'Average C=%.2f\n' "$AC"
Result:
Average A=49.47
Average B=24.78
Average C=25.75
A
has a random range of 0..99
with an average of 99/2=49.5
B
has a random range of 0..(99-49.5=49.5)
with an average of 49.5/2=24.75
C
has a random range of 0..(100-49.5=50.5)
with an average of 50.5/2=25.25
Now change the random range of A
to really 0..100
by computing the modulo 101
of the RANDOM
:
#!/usr/bin/env bash
declare -i NS=0 SA=0 SB=0 SC=0
declare -- AA="" AB="" AC=""
while [ $NS -lt 100000 ]; do
((A = RANDOM % 101, B = RANDOM % (101 - A), C = 100 - A - B, SA += A, SB += B, SC += C, NS += 1))
done
AA="$(bc <<<"scale=12;$SA/$NS")"
AB="$(bc <<<"scale=12;$SB/$NS")"
AC="$(bc <<<"scale=12;$SC/$NS")"
LC_NUMERIC=POSIX
printf 'Average A=%.2f\n' "$AA"
printf 'Average B=%.2f\n' "$AB"
printf 'Average C=%.2f\n' "$AC"
Averages are now A=50
, B=25
, C=25
Average A=49.99
Average B=25.01
Average C=25.00
Now how do you make it so that A
, B
and C
have equal range of 0-100?
We need to randomize the position of A
, B
and C
in the random generation, so each one has equal chance of a 0-100 or 0-50 range:
#!/usr/bin/env bash
declare -i NS=0 SA=0 SB=0 SC=0
declare -- AA="" AB="" AC=""
while [ $NS -lt 100000 ]; do
((A = RANDOM % 101, B = RANDOM % (101 - A), C = 100 - A - B ))
# Random flip A, B, C to even random ranges
case $((RANDOM % 6)) in
1) # ACB
((_ = B, B = C, C = _))
;;
2) # BAC
((_ = A, A = B, B = _))
;;
3) # BCA
((_ = A, A = B, B = C, C = _))
;;
4) # CAB
((_ = A, A = C, C = B, B = _ ))
;;
5) # CBA
((_ = A, A = C, C = _ ))
;;
esac
((SA += A, SB += B, SC += C, NS += 1))
done
AA="$(bc <<<"scale=12;$SA/$NS")"
AB="$(bc <<<"scale=12;$SB/$NS")"
AC="$(bc <<<"scale=12;$SC/$NS")"
LC_NUMERIC=POSIX
printf 'Average A=%.2f\n' "$AA"
printf 'Average B=%.2f\n' "$AB"
printf 'Average C=%.2f\n' "$AC"
Result with even range distribution:
Average A=33.24
Average B=33.36
Average C=33.39
It is now indeed an average of 100/3
for each one of A
, B
and C
.
If you read down to this. You just had a tiny illustration of how randomness and statistics, distributions, chances is a non-trivial question.
And lets face it. In the code above, the range of 0..100
is not evenly distributed because Bash's $RANDOM
gives a value in the 0..32767
range.
This is exactly 2¹⁵=32768
random values, that does not distributes evenly on modulo 101
, because as you guess, 32768
is not a multiple of 101
.
I give you the puzzle to solve, for obtaining an even range distribution of random values in the 0..100
range using Bash's $RANDOM
:)
Solution to the evenly distributed random range:
To make sure to obtain a random number that is evenly distributed into a defined number of values with Bash's $RANDOM
variable that has 32768 different values, we get the largest multiple of our range max and discard any $RANDOM
that is higher than that.
If you have a 6-sided dice but have no 4-sided dice available:
If you roll your 6-sided dice to more than 4, you roll again until it is less than or equals to 4.
If you have a 20-sided dice (D2 players know about this one) but have no 6-sided dice available:
then modulo 6 your <= 18-roll
roll of 1,2,3, gives you 1
This is the implementation in the ranged_random()
function below:
#!/usr/bin/env bash
# Get an evenly distributed random integer in range
# @Params:
# $1: The lower bound of range or upper bound if single argument
# $2: The optional upper bound of range
# @Output:
# >&1: The evenly distributed random integer in range
ranged_random() {
local -i min=0 max=0
case $# in
2) ((min = $1, max = $2)) ;;
1) ((max = $1)) ;;
*)
return 1 # at least upper bound is required
;;
esac
[ $min -ge $max ] && return 2 # no random possible in null or negative range
local -i rand_count=$((max - min + 1))
[ $rand_count -gt 32768 ] && return 3 # Bash's $RANDOM overflow
local -i rand_max=$((32768 - 32768 % rand_count))
local -i rnd
# Get a random int until it fits in rand_max
while ((rnd=RANDOM, rnd > rand_max)); do :; done
echo $((rnd % rand_count + min))
}
declare -i NS=0 SA=0 SB=0 SC=0
declare -- AA="" AB="" AC=""
while [ $NS -lt 100 ]; do
A=$(ranged_random 100)
B=$(ranged_random $((100 - A)))
C=$((100 - A - B))
# Random flip A, B, C to even random ranges
case $(ranged_random 5) in
1) # ACB
((_ = B, B = C, C = _))
;;
2) # BAC
((_ = A, A = B, B = _))
;;
3) # BCA
((_ = A, A = B, B = C, C = _))
;;
4) # CAB
((_ = A, A = C, C = B, B = _))
;;
5) # CBA
((_ = A, A = C, C = _))
;;
esac
((SA += A, SB += B, SC += C, NS += 1))
done
AA="$(bc <<<"scale=12;$SA/$NS")"
AB="$(bc <<<"scale=12;$SB/$NS")"
AC="$(bc <<<"scale=12;$SC/$NS")"
LC_NUMERIC=POSIX
printf 'Average A=%.2f\n' "$AA"
printf 'Average B=%.2f\n' "$AB"
printf 'Average C=%.2f\n' "$AC"
Upvotes: 2
Reputation: 360153
This allows the input of any total to reach and any number of addends to be requested. Addends are allowed to range from 0 to the total - 1 (except for the last which could be as much as the total).
Note that there's no error checking. If the total is requested to be greater than the maximum value of $RANDOM
(32767) and the number of addends is greater than 1, none of the addends but the last (or only) can be greater than 32767.
To run it:
$ ./sum_nums 100 3
69 25 6 = 100
$ ./sum_nums 10000 7
1200 541 8198 25 1 3 32 = 10000
#!/bin/bash
sum_nums () {
local remaining=$1
local count=$2
local total
for (( i = 1; i < count; i++ ))
do
(( num = $RANDOM % remaining ))
printf '%s ' "$num"
(( remaining -= num ))
(( total += num ))
done
(( total += remaining ))
printf '%s = %s\n' "$remaining" "$total"
}
sum_nums "$@"
It wouldn't be difficult to add the ability to limit the range of the addends.
Upvotes: 1
Reputation: 17268
Don't know what the range is for each number, from your examples it can be at least 70.
R1 value is 46. Range for R2 is 100 - 46 - 1 = 53
(minus 1 because R3 must have a value).R2 value is 13.
R3 is then 100 - (46 + 13) = 41.
Upvotes: 1