Michael Gruenstaeudl
Michael Gruenstaeudl

Reputation: 1783

Sort text columns by number of lines in bash

Assume a text file that contains x number of string columns.

$cat file # where x=3
foo  foo  foo
bar  bar  bar
     baz  baz
     qux

Is there a way in bash to sort these columns by the number text strings (i.e., filled rows) they contain, while maintaining the internal order of rows in each column?

$sought_command file
foo  foo  foo
bar  bar  bar
baz  baz
qux

Essentially, the column with the most number of rows is to be first, the column with the second-most number of rows is to be second, etc.

(This task would be easy to implement via R, but I am wondering about a solution via bash.)

EDIT 1:

Here are some additional details: Every column contains at least one text string (i.e., one filled row). The text strings may constitute any alphanumeric combination and have any length (but obviously do not contain spaces). The output columns must not have blank rows inserted. There is no a priori limitation on the column delimiter, as long as it remains consistent across the table.

All that is needed for this task is to shift the columns around as-is such that they are sorted by column length. (I know that implementing this in bash sounds easier than it actually is.)

Upvotes: 6

Views: 228

Answers (4)

anubhava
anubhava

Reputation: 785058

Create a function called transpose like this first:

transpose() {
   awk -v FPAT='[^[:blank:]]+|[ \t]{3,}' '{
     for (i=1; i<=NF; i++)
        a[i,NR]=$i
        max=(max<NF?NF:max)
     }
     END {for (i=1; i<=max; i++)
        for (j=1; j<=NR; j++)
           printf "%s%s", a[i,j], (j==NR?ORS:OFS)
   }'
}

Then use it as:

transpose < file | awk '{print NF "\t" $0}' | sort -k1nr | cut -f2- | transpose

foo foo foo
bar bar bar
baz baz
qux

Steps are:

  1. Call transpose function to transpose column into rows
  2. Use awk to add # of fields at the start of each line
  3. Use sort in reverse numerical order of first column
  4. Use cut to get rid of first column
  5. Call transpose again to transpose column into rows to get original order

PS: Due to use of FPAT we will need gnu-awk here.

Upvotes: 1

Eric Duminil
Eric Duminil

Reputation: 54223

sed -e 's/^ *//' columns.txt
# =>
# foo  foo  foo
# bar  bar  bar
# baz  baz
# qux

I'll be here all week! :D

On a more serious note, you might want to transpose your columns with bash, with either awk or rs. This will make it much easier to sort your columns (now rows), and transpose them back again. Multiple spaces could pose a problem to awk, though.

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203324

With GNU awk for sorted_in and assuming your columns are tab-separated:

$ cat tst.awk
BEGIN{ FS=OFS="\t" }
{
    for (i=1; i<=NF; i++) {
        if ($i ~ /[^[:space:]]/) {
            cell[NR,i] = $i
            cnt[i]++
        }
    }
    next
}
END {
    PROCINFO["sorted_in"] = "@val_num_desc"
    for (row=1; row<=NR; row++) {
        c=0
        for (col in cnt) {
            printf "%s%s", (c++?OFS:""), cell[row,col]
        }
        print ""
    }
}

$ awk -f tst.awk file
foo     foo     foo
bar     bar     bar
baz     baz
qux

Upvotes: 4

karakfa
karakfa

Reputation: 67467

with unix toolset

$ tr '\t' '\n' <file                  | 
  pr -4ts                             |  
  awk '{print gsub(/-/,"-") "\t" $0}' | 
  sort -k1n                           | 
  cut -f2-                            | 
  tr '\t' '\n'                        | 
  pr -3ts

foo     foo     foo
bar     bar     bar
baz     baz     -
qux     -       -

assumes the columns are tab separated and the missing values are represented with "-". The magic numbers 4 and 3 are the number of rows and columns respectively.

Used this as the input file

$ cat file
foo     foo     foo
bar     bar     bar
-       baz     baz
-       qux     -

Upvotes: 1

Related Questions