w00t
w00t

Reputation: 676

Bash script randomize values that add up to given total

In a bash script I have 3 keywords that I want to assign each a random value, which when added up total 100.
After every run of the script, each keyword should always get random values, but in the end still add up to 100.

So like this:

A=10
B=70
C=20

and the second run:

A=35
B=57
C=8

and so on.
I know I can use RANDOM directly from bash, or shuf, but I cannot wrap my head around always getting values that eventually add up to 100.

Upvotes: 1

Views: 448

Answers (4)

Ed Morton
Ed Morton

Reputation: 203684

$ cat tst.sh
#!/bin/env bash

genNums() {
    local cnt="$1" tot="$2" random

    # Print "cnt" evenly distributed random numbers that when summed
    # total the value in "tot".

    # The awk part generates 3 random numbers but the distribution
    # of those numbers is not even. The first number generated is
    # in the range min-max but the second number is always in a
    # more restricted range except in that 1/(max-min) case where
    # that the first number was zero, and so on.

    # The shuf randomizes the order those numbers are output thereby
    # creating an even distribution of the numbers. If you dont have
    # shuf or just prefer an all awk solution then within the call to
    # awk you can save the generated numbers in an indexed array and
    # then use a Knuth Shuffle to output them randomly, see
    # https://stackoverflow.com/a/27386400/1745001.

    # Without a seed for srand(), it can produce the same output on
    # multiple calls within the same second as srand() can use the current
    # seconds since the epoch as the seed value. We dont want to just
    # use $RANDOM for the seed as its range is just 0-32767 and so
    # would lead to repetitions every 32768 or less calls of the script.
    # "-N4 -tu4" leads to random numbers output, changing 4 to 8 causes a
    # lot of repetition, presumably due to some truncation in srand().
    # Could alternatively use random=$(( $(date '+%s') + $RANDOM )).
    random=$(od -An -N4 -tu4 < /dev/urandom)

    awk -v cnt="$cnt" -v max="$tot" -v seed="$random" '
        BEGIN {
            srand(seed)
            for (i=1; i<cnt; i++) {
                val = int( rand() * (max+1) )
                print val
                max = max - val
            }
            val = max - min
            print val
        }
    ' |
    shuf
}

readarray -t arr < <(genNums 3 100)

printf 'A=%s\n' "${arr[0]}"
printf 'B=%s\n' "${arr[1]}"
printf 'C=%s\n' "${arr[2]}"

Here's some individual runs to show random numbers being generated:

$ ./tst.sh
A=10
B=58
C=32

$ ./tst.sh
A=17
B=56
C=27

$ ./tst.sh
A=28
B=33
C=39

and here's the averages of 10000 runs to show a random distribution:

$ for ((i=1; i<=10000; i++)); do ./tst.sh; done |
    awk 'BEGIN{FS=OFS="="} {s[$1]+=$2} END{for (i in s) print i, s[i]/(NR/length(s))}'
A=33.8476
B=33.6365
C=33.5159

Hmm. I just tried running that again but this time printing out the distribution of 0-100 values output by doing:

$ for ((i=1; i<=10000; i++)); do ./tst.sh; done |
    awk 'BEGIN{FS=OFS="="} {s[$1]+=$2; n[$2]++}
    END{
        for (i in s) print i, s[i]/(NR/length(s));
        print "---";
        for (i=0;i<=100;i++) print i, n[i]+0
    }'

and I got this:

0=1112
1=927
2=811
3=811
4=732
5=689
6=615
7=589
8=586
9=566
10=548
11=495
12=528
13=506
14=488
15=470
16=448
17=488
18=466
19=423
20=406
21=388
22=384
23=398
24=381
25=381
26=370
27=366
28=353
29=351
30=339
31=363
32=315
33=312
34=315
35=293
36=270
37=264
38=273
39=280
40=290
41=269
42=268
43=281
44=260
45=283
46=303
47=251
48=248
49=225
50=261
51=243
52=239
53=236
54=221
55=217
56=229
57=184
58=171
59=193
60=183
61=230
62=198
63=190
64=183
65=190
66=195
67=178
68=203
69=162
70=160
71=157
72=141
73=152
74=148
75=158
76=164
77=170
78=145
79=135
80=145
81=156
82=135
83=127
84=136
85=131
86=118
87=140
88=131
89=117
90=116
91=128
92=105
93=112
94=127
95=110
96=114
97=112
98=111
99=117
100=95

which indicates you'll get numbers at the lower and of the range more frequently than at the higher end. Not sure why...

I just figured it out and the output I'm getting is to be expected since in a batch of 3 numbers that add up to 100 the 1st number can be from 0-100 but the 2nd will almost always be in a smaller range (and so have a smaller value) and the 3rd in an even smaller range than that. Hence the skew towards smaller numbers since 2 out of every 3 numbers generated in every iteration will be in a smaller range than 0-100.

Upvotes: 3

L&#233;a Gris
L&#233;a Gris

Reputation: 19585

You can do it as suggested earlier:

#!/usr/bin/env bash
for _ in {1..10};do
((A=RANDOM%100, B=RANDOM%(100-A), C=100-A-B))
echo A=$A B=$B C=$C A+B+C=$((A+B+C))
done

But distribution is not even:

Lets demonstrate the unevenness:

#!/usr/bin/env bash
declare -i NA=0 NB=0 NC=0
declare -i SA=0 SB=0 SC=0
declare -- AA="" AB="" AC=""
for _ in {1..1000000};do
  ((A = RANDOM % 100, B = RANDOM % (100 - A), C = 100 - A - B))
  ((SA += A, NA += 1))
  ((SB += B, NB += 1))
  ((SC += C, NC += 1))
done
AA="$(bc <<<"scale=12;$SA/$NA")"
AB="$(bc <<<"scale=12;$SB/$NB")"
AC="$(bc <<<"scale=12;$SC/$NC")"
LC_NUMERIC=POSIX
printf 'Average A=%.2f\n' "$AA"
printf 'Average B=%.2f\n' "$AB"
printf 'Average C=%.2f\n' "$AC"

Result:

Average A=49.47
Average B=24.78
Average C=25.75
  • A has a random range of 0..99 with an average of 99/2=49.5
  • B has a random range of 0..(99-49.5=49.5) with an average of 49.5/2=24.75
  • C has a random range of 0..(100-49.5=50.5) with an average of 50.5/2=25.25

Now change the random range of A to really 0..100 by computing the modulo 101 of the RANDOM:

#!/usr/bin/env bash
declare -i NS=0 SA=0 SB=0 SC=0
declare -- AA="" AB="" AC=""
while [ $NS -lt 100000 ]; do
  ((A = RANDOM % 101, B = RANDOM % (101 - A), C = 100 - A - B, SA += A, SB += B, SC += C, NS += 1))
done
AA="$(bc <<<"scale=12;$SA/$NS")"
AB="$(bc <<<"scale=12;$SB/$NS")"
AC="$(bc <<<"scale=12;$SC/$NS")"
LC_NUMERIC=POSIX
printf 'Average A=%.2f\n' "$AA"
printf 'Average B=%.2f\n' "$AB"
printf 'Average C=%.2f\n' "$AC"

Averages are now A=50, B=25, C=25

Average A=49.99
Average B=25.01
Average C=25.00

Now how do you make it so that A, B and C have equal range of 0-100?

We need to randomize the position of A, B and C in the random generation, so each one has equal chance of a 0-100 or 0-50 range:

#!/usr/bin/env bash
declare -i NS=0 SA=0 SB=0 SC=0
declare -- AA="" AB="" AC=""
while [ $NS -lt 100000 ]; do
  ((A = RANDOM % 101, B = RANDOM % (101 - A), C = 100 - A - B ))
  # Random flip A, B, C to even random ranges
  case $((RANDOM % 6)) in
    1) # ACB
      ((_ = B, B = C, C = _))
      ;;
    2) # BAC
      ((_ = A, A = B, B = _))
      ;;
    3) # BCA
      ((_ = A, A = B, B = C, C = _))
      ;;
    4) # CAB
      ((_ = A, A = C, C = B, B = _ ))
      ;;
    5) # CBA
      ((_ = A, A = C, C = _ ))
      ;;
  esac
  ((SA += A, SB += B, SC += C, NS += 1))
done

AA="$(bc <<<"scale=12;$SA/$NS")"
AB="$(bc <<<"scale=12;$SB/$NS")"
AC="$(bc <<<"scale=12;$SC/$NS")"
LC_NUMERIC=POSIX
printf 'Average A=%.2f\n' "$AA"
printf 'Average B=%.2f\n' "$AB"
printf 'Average C=%.2f\n' "$AC"

Result with even range distribution:

Average A=33.24
Average B=33.36
Average C=33.39

It is now indeed an average of 100/3 for each one of A, B and C.

If you read down to this. You just had a tiny illustration of how randomness and statistics, distributions, chances is a non-trivial question.

And lets face it. In the code above, the range of 0..100 is not evenly distributed because Bash's $RANDOM gives a value in the 0..32767 range.

This is exactly 2¹⁵=32768 random values, that does not distributes evenly on modulo 101, because as you guess, 32768 is not a multiple of 101.

I give you the puzzle to solve, for obtaining an even range distribution of random values in the 0..100 range using Bash's $RANDOM :)

Solution to the evenly distributed random range:

To make sure to obtain a random number that is evenly distributed into a defined number of values with Bash's $RANDOM variable that has 32768 different values, we get the largest multiple of our range max and discard any $RANDOM that is higher than that.

If you have a 6-sided dice but have no 4-sided dice available:

  • 6/4 = 1.5, integer of 1.5 is 1
  • 1 * 4 = 4

If you roll your 6-sided dice to more than 4, you roll again until it is less than or equals to 4.

If you have a 20-sided dice (D2 players know about this one) but have no 6-sided dice available:

  • 20/6 = 3.33333~, integer = 3
  • wanted 6 dice × 3 = 18
  • You discard all rolls over 18
  • then modulo 6 your <= 18-roll

  • roll of 1,2,3, gives you 1

  • roll of 4,5,6, gives you 2
  • ...
  • roll of 16,17,18, gives you 6

This is the implementation in the ranged_random() function below:

#!/usr/bin/env bash
# Get an evenly distributed random integer in range
# @Params:
# $1: The lower bound of range or upper bound if single argument
# $2: The optional upper bound of range
# @Output:
# >&1: The evenly distributed random integer in range
ranged_random() {
  local -i min=0 max=0
  case $# in
    2) ((min = $1, max = $2)) ;;
    1) ((max = $1)) ;;
    *)
      return 1 # at least upper bound is required
      ;;
  esac
  [ $min -ge $max ] && return 2 # no random possible in null or negative range
  local -i rand_count=$((max - min + 1))
  [ $rand_count -gt 32768 ] && return 3 # Bash's $RANDOM overflow
  local -i rand_max=$((32768 - 32768 % rand_count))
  local -i rnd
  # Get a random int until it fits in rand_max
  while ((rnd=RANDOM, rnd > rand_max)); do :; done
  echo $((rnd % rand_count + min))
}

declare -i NS=0 SA=0 SB=0 SC=0
declare -- AA="" AB="" AC=""
while [ $NS -lt 100 ]; do
  A=$(ranged_random 100)
  B=$(ranged_random $((100 - A)))
  C=$((100 - A - B))
  # Random flip A, B, C to even random ranges
  case $(ranged_random 5) in
    1) # ACB
      ((_ = B, B = C, C = _))
      ;;
    2) # BAC
      ((_ = A, A = B, B = _))
      ;;
    3) # BCA
      ((_ = A, A = B, B = C, C = _))
      ;;
    4) # CAB
      ((_ = A, A = C, C = B, B = _))
      ;;
    5) # CBA
      ((_ = A, A = C, C = _))
      ;;
  esac
  ((SA += A, SB += B, SC += C, NS += 1))
done

AA="$(bc <<<"scale=12;$SA/$NS")"
AB="$(bc <<<"scale=12;$SB/$NS")"
AC="$(bc <<<"scale=12;$SC/$NS")"
LC_NUMERIC=POSIX
printf 'Average A=%.2f\n' "$AA"
printf 'Average B=%.2f\n' "$AB"
printf 'Average C=%.2f\n' "$AC"

Upvotes: 2

Dennis Williamson
Dennis Williamson

Reputation: 360153

This allows the input of any total to reach and any number of addends to be requested. Addends are allowed to range from 0 to the total - 1 (except for the last which could be as much as the total).

Note that there's no error checking. If the total is requested to be greater than the maximum value of $RANDOM (32767) and the number of addends is greater than 1, none of the addends but the last (or only) can be greater than 32767.

To run it:

$ ./sum_nums 100 3
69 25 6 = 100
$ ./sum_nums 10000 7
1200 541 8198 25 1 3 32 = 10000
#!/bin/bash

sum_nums () {
    local remaining=$1
    local count=$2
    local total
    for (( i = 1; i < count; i++ ))
    do
        (( num = $RANDOM % remaining ))
        printf '%s ' "$num"
        (( remaining -= num ))
        (( total += num ))
    done
    (( total += remaining ))
    printf '%s = %s\n' "$remaining" "$total"
}

sum_nums "$@"

It wouldn't be difficult to add the ability to limit the range of the addends.

Upvotes: 1

suspectus
suspectus

Reputation: 17268

Don't know what the range is for each number, from your examples it can be at least 70.

Solution

  • Get the first random number R1. Subtract it from 100. Remaining possible values are (100-R1).
  • Get the second random number R2 (with range 1 to 100-R1-1).
  • Finally the third number is just 100 - R1 - R2.

Example

R1 value is 46. Range for R2 is 100 - 46 - 1 = 53
(minus 1 because R3 must have a value).

R2 value is 13.

R3 is then 100 - (46 + 13) = 41.

Upvotes: 1

Related Questions