Egill Richard
Egill Richard

Reputation: 31

bash - How to find the index of the maximum value of a row in a matrix

Here's my input :

chr1 58962 -0.042053 -22.525086 -20.817409 -19.525688 
chr1 58989 -0.014479 -14.459352 -12.824315 -11.744024
chr1 59155 -0.062963 -13.810858 -12.749009 -12.102778
chr1 59256 -0.014105 -7.371202  -9.117587  -11.525907

I'm looking for a way, in bash, to get the index of the maximum value of the row for each row. I don't want to take into account the first two columns.

I could do it very simply in R :

data=fread(myfile)
maxindex=apply(data[,3:6],1,which.max)

So that the output is an array containing the index. This is the kind of output I want in the end. In this case :

maxindex= 1  1  1  1

Unfortunately the whole file is 32 Gb (big table containing 300000 rows and 8183 columns) so that R can't take it even after I subsidized the original file. I've read that bash isn't made to work by row but is there still a way to do what I want to do?

Upvotes: 2

Views: 1211

Answers (3)

Brysen
Brysen

Reputation: 11

If you want the script written with basic bash operations, you could do something like this:

#!/bin/bash

# Function to find the max-value of a one-dimensional array
findMax() 
{   
    [[ -z $2 ]] && return # Exit early if the string is empty

    declare -a pararr=($@) #Insert the input into an array we can work with

    # Basic brute-force algorithm to find the highest value in the array 
    maxInd=2 
    for (( i = 3; i < $#; i++ )); do
        (( $(echo "${pararr[$i]} > ${pararr[$maxInd]}" | bc) )) && maxInd=$i
    done

    echo -n " $(( maxInd - 2 ))"
}

echo -n "Maxindex:"

# Feed our findMax row-by-row from the input file
while read -r line; do
    findMax $line
done < ${!#}


echo # Append newline at the end

This script takes a file that is formatted as your example, and searches for the max index row by row. However, in the file each row must be seperated by a newline like your example shows, else some wonky stuff may happen. You can of course extend the script to deal with other formats if you wish.

However if you want to do this operation on very large files, i think the solutions provided by the others here will be much better suited. I don't really know much about the overhead of bash since i use C/C++ for most performance-critical applications, but im guessing it's not very efficient.

(( $(echo "${pararr[$i]} > ${pararr[$maxInd]}" | bc) )) && maxInd=$i

This part of the script is really ugly, but i dont know of any better way to do floating-point arithmetic. What we are doing here is we are evaluating our current position in the row with the largest value we have found so far. So this:

echo "${pararr[$i]} > ${pararr[$maxInd]}

Might expand to something like this

0.356 > 1.567

We then pipe it into bc which does the floating-point comparisons for us. If our current position is greater than the greatest value we have found so far, we set our maxIndex to that value. Hope this helps.

Upvotes: 0

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Use the following awk solution, it will go faster than perl approach (on "big" files):

awk '{ m=$3; p=1; for(i=4;i<=NF;i++) { 
           if ($i>m) { m=$i; p=i-2 } } printf "%d ",p }' file > max_indices
  • m=$3 - initial maximum value (the 3rd field value)

  • for(i=4;i<=NF;i++) - iterating through remaining fields

  • if ($i>m) { m=$i; p=i-2 } - capturing maximal value

Upvotes: 2

choroba
choroba

Reputation: 241828

Perl solution:

perl -ane '$r = 2;
           for my $i (3 .. $#F) {
               $r = $i if $F[$i] > $F[$r];
           }
           print $r - 1, " ";
          ' < input-file > output-file
  • -n processes the input line by line
  • -a splits each line on whitespace into the @F array
  • $r stores the index of the maximum (set to 2 before processing each line)
  • in the for loop, we try all the other indices and store the index of the max if we find it
  • after having processed the whole line, we output the index - 1 (because indices start from 0 in Perl and you want to ignore the first 2)

Upvotes: 0

Related Questions