ex1led
ex1led

Reputation: 469

Optimally finding the index of the maximum element in BASH array

I am using bash in order to process software responses on-the-fly and I am looking for a way to find the index of the maximum element in the array.

The data that gets fed to the bash script is like this:

25 9
72 0
 3 3
 0 4
 0 7

And so I create two arrays. There is

arr1 = [ 25 72 3 0 0 ]
arr2 = [ 9   0 3 4 7 ]

And what I need is to find the index of the maximum number in arr1 in order to use it also for arr2. But I would like to see if there is a quick - optimal way to do this.

Would it maybe be better to use a dictionary structure [key][value] with the data I have? Would this make the process easier?

I have also found [1] (from user jhnc) but I don't quite think it is what I want.

My brute - force approach is the following:


function MAX {

   arr1=( 25 72 3 0 0 )
   arr2=( 9   0 3 4 7 )

   local indx=0
   local max=${arr1[0]}
   local flag

   for ((i=1; i<${#arr1[@]};i++)); do

       #To avoid invalid arithmetic operators when items are floats/doubles
       flag=$( python <<< "print(${arr1$[${i}]} > ${max})")    

       if [ $flag == "True" ]; then

           indx=${i}
           max=${arr1[${i}]}

       fi

    done

    echo "MAX:INDEX = ${max}:${indx}"
    echo "${arr1[${indx}]}"
    echo "${arr2[${indx}]}"

}

This approach obviously will work, BUT, is it the optimal one? Is there a faster way to perform the task?

arr1 = [ 99.97 0.01 0.01 0.01 0 ]
arr2 = [ 0 6 4 3 2 ]

In this example, if an array contains floats then I would get a

syntax error: invalid arithmetic operator (error token is ".97)

So, I am using

flag=$( python <<< "print(${arr1$[${i}]} > ${max})")

In order to overcome this issue.

Upvotes: 1

Views: 1100

Answers (2)

dan
dan

Reputation: 5231

It's difficult to do it efficiently if you really do need to compare floats. Bash can't do floats, which means invoking an external program for every number comparison. However, comparing every number in bash, is not necessarily needed.

Here is a fast, pure bash, integer only solution, using comparison:

#!/bin/bash

arr1=( 25 72 3 0 0)
arr2=( 9   0 3 4 7)

# Get the maximum, and also save its index(es)
for i in "${!arr1[@]}"; do
        if ((arr1[i]>arr1_max)); then
                arr1_max=${arr1[i]}
                max_indexes=($i)
        elif [[ "${arr1[i]}" == "$arr1_max" ]]; then
                max_indexes+=($i)
        fi
done

# Print the results
printf '%s\n' \
        "Array1 max is $arr1_max" \
        "The index(s) of the maximum are:" \
        "${max_indexes[@]}" \
        "The corresponding values from array 2 are:"

for i in "${max_indexes[@]}"; do
        echo "${arr2[i]}"
done

Here is another optimal method, that can handle floats. Comparison in bash is avoided altogether. Instead the much faster sort(1) is used, and is only needed once. Rather than starting a new python instance for every number.

#!/bin/bash

arr1=( 25 72 3 0 0)
arr2=( 9  0 3 4 7)

arr1_max=$(printf '%s\n' "${arr1[@]}" | sort -n | tail -1)
for i in "${!arr1[@]}"; do
        [[ "${arr1[i]}" == "$arr1_max" ]] &&
        max_indexes+=($i)
done

# Print the results
printf '%s\n' \
        "Array 1 max is $arr1_max" \
        "The index(s) of the maximum are:" \
        "${max_indexes[@]}" \
        "The corresponding values from array 2 are:"

for i in "${max_indexes[@]}"; do
        echo "${arr2[i]}"
done

Example output:

Array 1 max is 72
The index(s) of the maximum are:
1
The corresponding values from array 2 are:
0

Unless you need those arrays, you can also feed your input script directly in to something like this:

#!/bin/bash

input-script |
sort -nr |
awk '
(NR==1) {print "Max: "$1"\nCorresponding numbers:"; max = $1}
{if (max == $1) print $2; else  exit}'

Example (with some extra numbers):

$ echo \
'25 9
72 0
72 11
72 4
 3 3
 3 14
 0 4
 0 1
 0 7' |
sort -nr |
awk '(NR==1) {max = $1; print "Max: "$1"\nCorresponding numbers:"}
{if (max == $1) print $2; else  exit}'
Max: 72
Corresponding numbers:
4
11
0

You can also do it 100% in awk, including sorting:

$ echo \
'25 9
72 0
72 11
72 4
 3 3
 3 14
 0 4
 0 1
 0 7' |
awk '
{
    col1[a++] = $1
    line[a-1] = $0
}

END {
    asort(col1)
    col1_max = col1[a-1]
    print "Max is "col1_max"\nCorresponding numbers are:"

    for (i in line) {
        if (line[i] ~ col1_max"\\s") {
            split(line[i], max_line)
            print max_line[2]
        }
    }
}'
Max is 72
Corresponding numbers are:
0
11
4

Or, just to get the maximum of column 1, and any single number from column 2, that corresponds with it. As simply as possible:

$ echo \
'25 9
72 0
 3 3
 0 4
 0 7' |
sort -nr |
head -1
72 0

Upvotes: 0

chepner
chepner

Reputation: 531055

Finding a maximum is inherently an O(n) operation. But there's no need to spawn a Python process on each iteration to perform the comparison. Write a single awk script instead.

awk 'BEGIN {
   split(ARGV[1], a1);
   split(ARGV[2], a2);
   max=a1[1];
   indx=1;
   for (i in a1) {
     if (a1[i] > max) {
       indx = i;
       max = a1[i];
     }
   }
   print "MAX:INDEX = " max ":" (indx - 1)
   print a1[indx]
   print a2[indx]
}' "${arr1[*]}" "${arr2[*]}"

The two shell arrays are passed as space-separated strings to awk, which splits them back into awk arrays.

Upvotes: 3

Related Questions