How to replace list of numbers in column for random numbers in other column in BASH environment

I have a tab file with two columns like that

5 6 14 22 23 25 27 84 85 88 89 94 95 98 100             6 94
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406   205 284 307 406
2 10 13 40 47 58                                        2 13 40 87

and the desired output should be

5 6 14 22 23 25 27 84 85 88 89 94 95 98 100             14 27
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406   6 209 299 305
2 10 13 23 40 47 58 87                                  10 23 40 58

I would like to change the numbers in 2nd column for random numbers in 1st column resulting in an output in 2nd column with the same number of numbers. I mean e.g. if there are four numbers in 2nd column for x row, the output must have four random numbers from 1st column for this row, and so on...

I'm try to create two arrays by AWK and split and replace every number in 2nd column for numbers in 1st column but not in a randomly way. I have seen the rand() function but I don't know exactly how joint these two things in a script. Is it possible to do in BASH environment or are there other better ways to do it in BASH environment? Thanks in advance

Upvotes: 1

Views: 310

Answers (3)

karakfa
karakfa

Reputation: 67507

awk to the rescue!

$ awk -F'\t' 'function shuf(a,n)
                 {for(i=1;i<n;i++)
                    {j=i+int(rand()*(n+1-i));
                     t=a[i]; a[i]=a[j]; a[j]=t}}
             function join(a,n,x,s)
                  {for(i=1;i<=n;i++) {x=x s a[i]; s=" "}
                   return x}
             BEGIN{srand()}
                  {an=split($1,a," ");
                   shuf(a,an);
                   bn=split($2,b," ");
                   delete m; delete c; j=0;
                   for(i=1;i<=bn;i++) m[b[i]];
                   # pull elements from a upto required sample size, 
                   # not intersecting with the previous sample set
                   for(i=1;i<=an && j<bn;i++) if(!(a[i] in m)) c[++j]=a[i];
                   cn=asort(c);
                   print $1 FS join(c,cn)}' file


5 6 14 22 23 25 27 84 85 88 89 94 95 98 100     85 94
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406   20 205 294 295
2 10 13 23 40 47 58 87  10 13 47 87

shuffle (standard algorithm) the input array, sample required number of elements, additional requirement is no intersection with the existing sample set. Helper structure map to keep existing sample set and used for in tests. The rest should be easy to read.

Upvotes: 1

William Pursell
William Pursell

Reputation: 212278

Assuming that there is a tab delimiting the two columns, and each column is a space delimited list:

awk 'BEGIN{srand()} 
    {n=split($1,a," "); 
    m=split($2,b," "); 
    printf "%s\t",$1; 
    for (i=1;i<=m;i++) 
        printf "%d%c", a[int(rand() * n) +1], (i == m) ? "\n" : " "
    }' FS=\\t input

Upvotes: 1

TenG
TenG

Reputation: 4004

Try this:

# This can be an external file of course
# Note COL1 and COL2 seprated by hard TAB

cat <<EOF > d1.txt
5 6 14 22 23 25 27 84 85 88 89 94 95 98 100     6 94
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406   205 284 307 406
2 10 13 40 47 58        2 13 40 87
EOF

# Loop to read each line, not econvert TAB to:, though could have used IFS

cat d1.txt | sed 's/    /:/' | while read LINE
do
   # Get the 1st column data

   COL1=$( echo ${LINE} | cut -d':' -f1 )

   # Get col1 number of items

   NUM_COL1=$( echo ${COL1} | wc -w )

   # Get col2 number of items

   NUM_COL2=$( echo ${LINE} | cut -d':' -f2 | wc -w )

   # Now split col1 items into an array

   read -r -a COL1_NUMS <<< "${COL1}"


   COL2=" "

   # THis loop runs once for each COL2 item

   COUNT=0
   while [ ${COUNT} -lt ${NUM_COL2} ]
   do

      # Generate a random number to use as teh random index for COL1

      COL1_IDX=${RANDOM}
      let "COL1_IDX %= ${NUM_COL1}"

      NEW_NUM=${COL1_NUMS[${COL1_IDX}]}

      # Check for duplicate

      DUP_FOUND=$( echo "${COL2}" | grep ${NEW_NUM} )

      if [ -z "${DUP_FOUND}" ]
      then
         # Not a duplicate, increment loop conter and do next one

         let "COUNT = COUNT + 1 "

         # Add the random COL1 item to COL2

         COL2="${COL2} ${COL1_NUMS[${COL1_IDX}]}"
      fi
   done

   # Sort COL2

   COL2=$( echo ${COL2} | tr ' ' '\012' | sort -n | tr '\012' ' ' )

   # Print

   echo ${COL1} :: ${COL2}
done

Output:

5 6 14 22 23 25 27 84 85 88 89 94 95 98 100 :: 88 95
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406 :: 20 299 304 305
2 10 13 40 47 58 :: 2 10 40 58

Upvotes: 0

Related Questions