user3719014
user3719014

Reputation: 59

Sorting Columns From File BASH

I have the following shell script that reads in data from a file inputted at the command line. The file is a matrix of numbers, and I need to separate the file by columns and then sort the columns. Right now I can read the file and output the individual columns but I am getting lost on how to sort. I have inputted a sort statement, but it only sorts the first column.

EDIT: I have decided to take another route and actual transpose the matrix to turn the columns into rows. Since I have to later calculate the mean and median and have already successfully done this for the file row-wise earlier in the script - it was suggested to me to try and "spin" the matrix if you will to turn the columns into rows.

Here is my UPDATED code

     declare -a col=( )
     read -a line < "$1"
     numCols=${#line[@]}                          # save number of columns

     index=0
     while read -a line ; do
     for (( colCount=0; colCount<${#line[@]}; colCount++ )); do
      col[$index]=${line[$colCount]}
      ((index++))
     done
     done < "$1"

     for (( width = 0; width < numCols; width++ )); do
      for (( colCount = width; colCount < ${#col[@]}; colCount += numCols )    ); do

       printf "%s\t" ${col[$colCount]}
     done
    printf "\n"
   done

This gives me the following output:

    1 9 6 3 3 6
    1 3 7 6 4 4
    1 4 8 8 2 4
    1 5 9 9 1 7
    1 5 7 1 4 7

Though I'm now looking for:

    1 3 3 6 6 9
    1 3 4 4 6 7
    1 2 4 4 8 8
    1 1 5 7 9 9
    1 1 4 5 7 7

To try and sort the data, I have tried the following:

    sortCol=${col[$colCount]}
    eval col[$colCount]='($(sort <<<"${'$sortCol'[*]}"))'

Also: (which is how I sorted the row after reading in from line)

    sortCol=( $(printf '%s\t' "${col[$colCount]}" | sort -n) )

If you could provide any insight on this, it would be greatly appreciated!

Upvotes: 3

Views: 351

Answers (4)

Juan Diego Godoy Robles
Juan Diego Godoy Robles

Reputation: 14955

Not bash but i think this python code worths a look showing how this task can be achieved using built-in functions.

From the interpreter:

$ cat matrix.txt 
1 1 1 1 1
9 3 4 5 5
6 7 8 9 7
3 6 8 9 1
3 4 2 1 4
6 4 4 7 7

$ python
Python 2.7.3 (default, Jun 19 2012, 17:11:17) 
[GCC 4.4.3] on hp-ux11
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> f = open('./matrix.txt')
>>> for row in zip(*[sorted(list(a)) 
               for a in zip(*[a.split() for a in f.readlines()])]):
...    print ' '.join(row)
... 
1 1 1 1 1
3 3 2 1 1
3 4 4 5 4
6 4 4 7 5
6 6 8 9 7
9 7 8 9 7

Upvotes: 0

Mike S
Mike S

Reputation: 1361

Here's my entry in this little exercise. Should handle an arbitrary number of columns. I assume they're space-separated:

#!/bin/bash

linenumber=0
while read line; do
        i=0
        # Create an array for each column.
        for number in $line; do
                [ $linenumber == 0 ] && eval "array$i=()"
                eval "array$i+=($number)"
                (( i++ ))
        done    
        (( linenumber++ ))
done <$1
IFS=$'\n'
# Sort each column
for j in $(seq 0 $i ); do
        thisarray=array$j
        eval array$j='($(sort <<<"${'$thisarray'[*]}"))'
done    
# Print each array's 0'th entry, then 1, then 2, etc...
for k in $(seq 0 ${#array0[@]}); do
        for j in $(seq 0 $i ); do
                eval 'printf ${array'$j'['$k']}" "'
        done    
        echo "" 
done

Upvotes: 0

user4453924
user4453924

Reputation:

Awk script

awk '
{for(i=1;i<=NF;i++)a[i]=a[i]" "$i}      #Add to column array
END{
        for(i=1;i<=NF;i++){
                split(a[i],b)          #Split column
                x=asort(b)             #sort column
                for(j=1;j<=x;j++){     #loop through sort
                        d[j]=d[j](d[j]~/./?" ":"")b[j]  #Recreate lines
                }
        }
for(i=1;i<=NR;i++)print d[i]          #Print lines
}' file

Output

1 1 1 1 1
3 3 2 1 1
3 4 4 5 4
6 4 4 7 5
6 6 8 9 7
9 7 8 9 7

Upvotes: 0

David C. Rankin
David C. Rankin

Reputation: 84551

Note, as mentioned in the comments, a pure bash solution isn't pretty. There are a number of ways to do it, but this is probably the most straight forward. The following requires reading all values per line into the array, and saving the matrix stride so it can be transposed to read all column values into a row matrix and sorted. All sorted columns are inserted into new row matrix a2. Transposing that row matrix yields your original matrix back in column sort order.

Note this will work for any rank of column matrix in your file.

#!/bin/bash

test -z "$1" && {           ## validate number of input
    printf "insufficient input. usage:  %s <filename>\n" "${0//*\//}"
    exit 1;
}

test -r "$1" || {           ## validate file was readable
    printf "error: file not readable '%s'. usage:  %s <filename>\n" "$1" "${0//*\//}"
    exit 1;
}

## function: my sort integer array - accepts array and returns sorted array
## Usage: array=( "$(msia ${array[@]})" )
msia() {
    local a=( "$@" )
    local sz=${#a[@]}
    local _tmp
    [[ $sz -lt 2 ]] && { echo "Warning: array not passed to fxn 'msia'"; return 1; }
    for((i=0;i<$sz;i++)); do
        for((j=$((sz-1));j>i;j--)); do
        [[ ${a[$i]} -gt ${a[$j]} ]] && {
            _tmp=${a[$i]}
            a[$i]=${a[$j]}
            a[$j]=$_tmp
        }
        done
    done
    echo ${a[@]}
    unset _tmp
    unset sz
    return 0
}

declare -a a1               ## declare arrays and matrix variables
declare -a a2
declare -i cnt=0
declare -i stride=0
declare -i sz=0

while read line; do         ## read all lines into array
    a1+=( $line );
    (( cnt == 0 )) && stride=${#a1[@]}  ## calculate matrix stride
    (( cnt++ ))
done < "$1"

sz=${#a1[@]}                ## calculate matrix size
                            ## print original array
printf "\noriginal array:\n\n"
for ((i = 0; i < sz; i += stride)); do
    for ((j = 0; j < stride; j++)); do
        printf " %s" ${a1[i+j]}
    done
    printf "\n"
done

                            ## sort columns from stride array
for ((j = 0; j < stride; j++)); do
    for ((i = 0; i < sz; i += stride)); do
        arow+=( ${a1[i+j]} )
    done
    a2+=( $(msia ${arow[@]}) )  ## create sorted array
    unset arow
done
                            ## print the sorted array
printf "\nsorted array:\n\n"
for ((j = 0; j < cnt; j++)); do
    for ((i = 0; i < sz; i += cnt)); do
        printf " %s" ${a2[i+j]}
    done
    printf "\n"
done

exit 0

Output

$ bash sort_cols2.sh dat/matrix.txt

original array:

 1 1 1 1 1
 9 3 4 5 5
 6 7 8 9 7
 3 6 8 9 1
 3 4 2 1 4
 6 4 4 7 7

sorted array:

 1 1 1 1 1
 3 3 2 1 1
 3 4 4 5 4
 6 4 4 7 5
 6 6 8 9 7
 9 7 8 9 7

Upvotes: 1

Related Questions