Simos Neopoulos
Simos Neopoulos

Reputation: 113

How to sort 2 arrays in bash

I want to sort 2 arrays at the same time. The arrays are the following: wordArray and numArray. Both are global.

These 2 arrays contain all the words (without duplicates) and the number of the appearances of each word from a text file.

Right now I am using Bubble Sort to sort both of them at the same time:

# Bubble Sort function
function bubble_sort {   
    local max=${#numArray[@]}
    size=${#numArray[@]}
    while ((max > 0))
    do
        local i=0
        while ((i < max))
        do
            if [ "$i" != "$(($size-1))" ] 
            then
                if [ ${numArray[$i]} \< ${numArray[$((i + 1))]} ]
                   then
                   local temp=${numArray[$i]}
                   numArray[$i]=${numArray[$((i + 1))]}
                   numArray[$((i + 1))]=$temp
                    
                   local temp2=${wordArray[$i]}
                   wordArray[$i]=${wordArray[$((i + 1))]}
                  wordArray[$((i + 1))]=$temp2
                 fi
             fi
            ((i += 1))
        done
        ((max -= 1))
    done
}

#Calling Bubble Sort function
bubble_sort "${numArray[@]}" "${wordArray[@]}"

But for some reason it won't sort them properly when large arrays are in place.

Does anyone knows what's wrong with it or an other approach to sort the words with the corresponding number of appearance with or without arrays?

This:

wordArray = (because, maybe, why, the)
numArray = (5, 12, 20, 13)

Must turn to this:

wordArray = (why, the, maybe, because)
numArray = (20, 13, 12, 5)

Someone recommended to write the two arrays side by side in a text file and sort the file.

How will it work for this input:

1 Arthur
21 Zebra

to turn to this output:

21 Zebra
1 Arthur

Upvotes: 1

Views: 278

Answers (2)

tshiono
tshiono

Reputation: 22022

Assuming the arrays do not contain tab character or newline character, how about:

#!/bin/bash

wordArray=(why the maybe because)
numArray=(20 13 12 5)

tmp1=$(mktemp tmp.XXXXXX)                               # file to be sorted
tmp2=$(mktemp tmp.XXXXXX)                               # sorted result

for (( i = 0; i < ${#wordArray[@]}; i++ )); do
    echo "${numArray[i]}"$'\t'"${wordArray[i]}"         # write the number and word delimited by a tab character
done > "$tmp1"

sort -nrk1,1 "$tmp1" > "$tmp2"                          # sort the file by number in descending order

while IFS=$'\t' read -r num word; do                    # read the lines splitting by the tab character
    numArray_sorted+=("$num")                           # add the number to the array
    wordArray_sorted+=("$word")                         # add the word to the array
done < "$tmp2"

rm -- "$tmp1"                                           # unlink the temp file
rm -- "$tmp2"                                           # same as above

echo "${wordArray_sorted[@]}"                           # same as above
echo "${numArray_sorted[@]}"                            # see the result

Output:

why the maybe because
20 13 12 5

If you prefer not to create temp files, here is the process substitution version, which will run faster without writing/reading temp files.

#!/bin/bash

wordArray=(why the maybe because)
numArray=(20 13 12 5)

while IFS=$'\t' read -r num word; do
    numArray_sorted+=("$num")
    wordArray_sorted+=("$word")
done < <(
    sort -nrk1,1 < <(
        for (( i = 0; i < ${#wordArray[@]}; i++ )); do
            echo "${numArray[i]}"$'\t'"${wordArray[i]}"
        done
    )
)

echo "${wordArray_sorted[@]}"
echo "${numArray_sorted[@]}"

Or simpler (using the suggestion by KamilCuk):

#!/bin/bash

wordArray=(why the maybe because)
numArray=(20 13 12 5)

while IFS=$'\t' read -r num word; do
    numArray_sorted+=("$num")
    wordArray_sorted+=("$word")
done < <(
    paste <(printf "%s\n" "${numArray[@]}") <(printf "%s\n" "${wordArray[@]}") | sort -nrk1,1
)

echo "${wordArray_sorted[@]}"
echo "${numArray_sorted[@]}"

Upvotes: 1

dan
dan

Reputation: 5231

You need numeric sort for the numbers. You can sort an array like this:

mapfile -t wordArray <(printf '%s\n' "${wordArray[@]}" | sort -n)

But what you actually need is something like:

for num in "${numArray[@]}"; do
    echo "$num: ${wordArray[j++]}"
done |
sort -n k1,1

But, earlier in the process, you should have used only one array, where the word and frequency (or vice versa) are key value pairs. Then they always have a direct relationship, and can be printed similarly to the for loop above.

Upvotes: 0

Related Questions