Reputation: 931
I have an array that needs to be shuffled and I would like to optimize the speed of this algorithm by using Awk. I'm still relatively new to using Awk and I was trying to figure out the best way of simulating this algorithm. How can this properly be done?
Bash Shuffle:
shuffle() {
local size limit rand i
size=${#password[*]}
limit=$(( 32768 / size * size))
for ((i=size-1; i > 0; i--)); do
while (((rand=$RANDOM) >= limit)); do :; done
rand=$((rand % (i+1)))
tmp=${password[i]}
password[i]=${password[rand]}
password[rand]=$tmp
done
}
Awk Attempt:
shuffle() {
local size limit rand i
size=${#password[*]}
limit=$(( 32768 / size * size))
awk -v rand=$RANDOM 'BEGIN {
srand(rand);
for(i=size-1; i>0; i--) {
while(rand >= limit);
rand=rand % i + 1;
tmp=password[i];
password[i]=password[rand];
password[rand]=tmp;
}
}'
}
Upvotes: 1
Views: 1367
Reputation:
Here is an awk implementation of shuffle:
File a.awk:
function get_rand(max) {
return int(rand()*max)
}
function get_array_length(a) {
k=0
for( i in a) k++
return k
}
function arr2str(a) {
astr=""
for(i in a)
astr=((astr)(a[i])" ")
return astr
}
function shuffle_array(in_array) {
array_size=get_array_length(in_array);
## Initialize the random indexing array
for (i=1;i<=array_size;i++)
rand_select[i]=in_array[i]
ridsz=array_size
for(i=1;i<=array_size;i++) {
ridx=get_rand(ridsz)+1;
newarr[i]=rand_select[ridx];
rand_select[ridx]=rand_select[ridsz] ## Move last element, preserve indx
delete rand_select[ridsz];
ridsz--;
}
return arr2str(newarr);
}
BEGIN {
"date +%N"|getline rseed;
srand(rseed);
close("date +%N");
split(vstr, varr, " ");
split(shuffle_array(varr),shuf_varr, " ");
for (element in shuf_varr)
print "got:",shuf_varr[element]
}
Then call it like this: awk -vvstr="$(echo {1..10000})" -f /path/to/a.awk
It's not much optimized, I'll leave anything further up to you -- on my machine it's running at ~0.25 seconds per 10000 records.
Upvotes: 0
Reputation: 1871
As you are looking to improve speed... Instead of using either of these approaches, the "shuf" utility should provide the preferred implementation of random shuffle.
password=( $(printf '%s\0' "${password[@]}" | shuf -z | xargs -0) )
If security of the shuffle is a concern, use of an external random source is optional (at the possible expense of execution speed).
password=( $(printf '%s\0' "${password[@]}" | shuf -z --random-source=/dev/random | xargs -0) )
Upvotes: 0
Reputation: 241701
awk has the rand
function, which generates a random number between 0.0 and 1.0 (actually, strictly less than 1.0). To get a random integer in the range [0, i+1)
, use int(rand()*(i+1))
. I don't think srand
does what you think it does; srand
sets the "seed" for awk's random number generator, which avoids generating the same random number sequence every time you invoke awk
. Normally, the seed is set from something which changes frequently, like the time -- although that's not ideal -- or a value extracted from /dev/random
.
A couple of observations:
1) I understand that your loop
while (((rand=$RANDOM) >= limit)); do :; done
is attempting to avoid bias in the random number generated by $RANDOM
, since that number only has 16 bits and thus the bias might be noticeable. However, it will only avoid bias the first time through the loop, when i+1 == size
, since limit
was computed based on size
. After that, limit
will be the wrong value. You could improve the computation, or you could generate a random number with more random bits by using /dev/urandom
, but personally I'd just use the shuf
utility, which does what you want (randomly shuffles an input). Of course, that's more pragmatic than didactic; it's not as educational as writing your own shuffler.
2) That applies equally to the awk
solution (i.e., why use awk
when you can use shuf
?). But in any event, assigning the awk
variable rand
from the bash
magical $RANDOM
does not make rand
magical. awk
and bash
aren't linked in any way. (And you can't just use bash variables like limit
in an awk script either.)
Upvotes: 1