Reputation: 1783
Assume a text file that contains x number of string columns.
$cat file # where x=3
foo foo foo
bar bar bar
baz baz
qux
Is there a way in bash to sort these columns by the number text strings (i.e., filled rows) they contain, while maintaining the internal order of rows in each column?
$sought_command file
foo foo foo
bar bar bar
baz baz
qux
Essentially, the column with the most number of rows is to be first, the column with the second-most number of rows is to be second, etc.
(This task would be easy to implement via R
, but I am wondering about a solution via bash.)
EDIT 1:
Here are some additional details: Every column contains at least one text string (i.e., one filled row). The text strings may constitute any alphanumeric combination and have any length (but obviously do not contain spaces). The output columns must not have blank rows inserted. There is no a priori limitation on the column delimiter, as long as it remains consistent across the table.
All that is needed for this task is to shift the columns around as-is such that they are sorted by column length. (I know that implementing this in bash sounds easier than it actually is.)
Upvotes: 6
Views: 228
Reputation: 785058
Create a function called transpose like this first:
transpose() {
awk -v FPAT='[^[:blank:]]+|[ \t]{3,}' '{
for (i=1; i<=NF; i++)
a[i,NR]=$i
max=(max<NF?NF:max)
}
END {for (i=1; i<=max; i++)
for (j=1; j<=NR; j++)
printf "%s%s", a[i,j], (j==NR?ORS:OFS)
}'
}
Then use it as:
transpose < file | awk '{print NF "\t" $0}' | sort -k1nr | cut -f2- | transpose
foo foo foo
bar bar bar
baz baz
qux
Steps are:
transpose
function to transpose column into rowsawk
to add # of fields at the start of each linesort
in reverse numerical order of first columncut
to get rid of first columntranspose
again to transpose column into rows to get original orderPS: Due to use of FPAT
we will need gnu-awk here.
Upvotes: 1
Reputation: 54223
sed -e 's/^ *//' columns.txt
# =>
# foo foo foo
# bar bar bar
# baz baz
# qux
I'll be here all week! :D
On a more serious note, you might want to transpose your columns with bash, with either awk
or rs
. This will make it much easier to sort your columns (now rows), and transpose them back again.
Multiple spaces could pose a problem to awk
, though.
Upvotes: 0
Reputation: 203324
With GNU awk for sorted_in and assuming your columns are tab-separated:
$ cat tst.awk
BEGIN{ FS=OFS="\t" }
{
for (i=1; i<=NF; i++) {
if ($i ~ /[^[:space:]]/) {
cell[NR,i] = $i
cnt[i]++
}
}
next
}
END {
PROCINFO["sorted_in"] = "@val_num_desc"
for (row=1; row<=NR; row++) {
c=0
for (col in cnt) {
printf "%s%s", (c++?OFS:""), cell[row,col]
}
print ""
}
}
$ awk -f tst.awk file
foo foo foo
bar bar bar
baz baz
qux
Upvotes: 4
Reputation: 67467
with unix toolset
$ tr '\t' '\n' <file |
pr -4ts |
awk '{print gsub(/-/,"-") "\t" $0}' |
sort -k1n |
cut -f2- |
tr '\t' '\n' |
pr -3ts
foo foo foo
bar bar bar
baz baz -
qux - -
assumes the columns are tab separated and the missing values are represented with "-". The magic numbers 4 and 3 are the number of rows and columns respectively.
Used this as the input file
$ cat file
foo foo foo
bar bar bar
- baz baz
- qux -
Upvotes: 1