Reputation: 931

Shuffling with Awk

I have an array that needs to be shuffled and I would like to optimize the speed of this algorithm by using Awk. I'm still relatively new to using Awk and I was trying to figure out the best way of simulating this algorithm. How can this properly be done?

Bash Shuffle:

 shuffle() {

 local size limit rand i

 size=${#password[*]}
 limit=$(( 32768 / size * size))

 for ((i=size-1; i > 0; i--)); do
   while (((rand=$RANDOM) >= limit)); do :; done
   rand=$((rand % (i+1)))
   tmp=${password[i]}
   password[i]=${password[rand]}
   password[rand]=$tmp
done
}

Awk Attempt:

shuffle() {

local size limit rand i

size=${#password[*]}
limit=$(( 32768 / size * size))

awk -v rand=$RANDOM 'BEGIN {
  srand(rand);
  for(i=size-1; i>0; i--) {
    while(rand >= limit);
    rand=rand % i + 1;
    tmp=password[i];
    password[i]=password[rand];
    password[rand]=tmp;
  }
}'
}

Upvotes: 1

Answers (3)

user4401178

Reputation:

Here is an awk implementation of shuffle:

File a.awk:

function get_rand(max) {
return int(rand()*max)
}

function get_array_length(a) {
 k=0    
 for( i in a) k++
  return k
}

function arr2str(a) {
astr="" 
 for(i in a) 
  astr=((astr)(a[i])" ")  
return astr 
}

function shuffle_array(in_array) { 
array_size=get_array_length(in_array);
 ## Initialize the random indexing array
for (i=1;i<=array_size;i++) 
 rand_select[i]=in_array[i]
ridsz=array_size 
for(i=1;i<=array_size;i++) { 
 ridx=get_rand(ridsz)+1;
 newarr[i]=rand_select[ridx];
 rand_select[ridx]=rand_select[ridsz] ## Move last element, preserve indx
 delete rand_select[ridsz];
 ridsz--; 
}
return arr2str(newarr); 
}

BEGIN {
"date +%N"|getline rseed; 
srand(rseed);
close("date +%N");
split(vstr, varr, " "); 
split(shuffle_array(varr),shuf_varr, " ");
for (element in shuf_varr) 
  print "got:",shuf_varr[element]
}

Then call it like this: awk -vvstr="$(echo {1..10000})" -f /path/to/a.awk

It's not much optimized, I'll leave anything further up to you -- on my machine it's running at ~0.25 seconds per 10000 records.

Upvotes: 0

Michael Back

Reputation: 1871

As you are looking to improve speed... Instead of using either of these approaches, the "shuf" utility should provide the preferred implementation of random shuffle.

password=( $(printf '%s\0' "${password[@]}" | shuf -z | xargs -0) )

If security of the shuffle is a concern, use of an external random source is optional (at the possible expense of execution speed).

password=( $(printf '%s\0' "${password[@]}" | shuf -z --random-source=/dev/random | xargs -0) )

Upvotes: 0

rici

Reputation: 241701

awk has the rand function, which generates a random number between 0.0 and 1.0 (actually, strictly less than 1.0). To get a random integer in the range [0, i+1), use int(rand()*(i+1)). I don't think srand does what you think it does; srand sets the "seed" for awk's random number generator, which avoids generating the same random number sequence every time you invoke awk. Normally, the seed is set from something which changes frequently, like the time -- although that's not ideal -- or a value extracted from /dev/random.

A couple of observations:

1) I understand that your loop

while (((rand=$RANDOM) >= limit)); do :; done

is attempting to avoid bias in the random number generated by $RANDOM, since that number only has 16 bits and thus the bias might be noticeable. However, it will only avoid bias the first time through the loop, when i+1 == size, since limit was computed based on size. After that, limit will be the wrong value. You could improve the computation, or you could generate a random number with more random bits by using /dev/urandom, but personally I'd just use the shuf utility, which does what you want (randomly shuffles an input). Of course, that's more pragmatic than didactic; it's not as educational as writing your own shuffler.

2) That applies equally to the awk solution (i.e., why use awk when you can use shuf?). But in any event, assigning the awk variable rand from the bash magical $RANDOM does not make rand magical. awk and bash aren't linked in any way. (And you can't just use bash variables like limit in an awk script either.)

Upvotes: 1

Shuffling with Awk

Answers (3)

Related Questions