Reputation: 21274
Say I have a file:
ab
aa
c
aaaa
I would like it to be sorted like this
c
aa
ab
aaaa
That is to sort by line length and then alphabetically. Is that possible in bash?
Upvotes: 3
Views: 1672
Reputation: 28406
You can prepend the length of the line to each line, then sort numerically, and finally cutting out the numbers
< your_file awk '{ print length($0), $0; }' | sort -n | cut -f2
You see that I've accomplished the sorting via sort -n
, without doing any multi-key sorting. Honestly I was lucky that this worked:
I didn't think that lines could begin with numbers and so I expected sort -n
to work because alphabetic and numeric sorting give the same result if all the strings are the same length, as is the case exaclty because we are sorting by the line length which I'm adding via awk.
It turns out everything works even if your input has lines starting with digits, the reason being that sort -n
strcmp
to compare the whole linesHere's some demo:
$ echo -e '3 11\n3 2' | sort -n
3 11
3 2
# the `3 ` on both lines makes them equal for numerical sorting
# but `3 11` comes before `3 2` by `strcmp` before `1` comes before `2`
$ echo -e '3 11\n03 2' | sort -n
03 2
3 11
# the `03 ` vs `3 ` is a numerical tie,
# but `03 2` comes before `3 11` by `strcmp` because `0` comes before `3`
So the lucky part is that the ,
I included in the awk
command inserts a space (actually an OFS
), i.e. a non-digit, thus "breaking" the numeric sorting and letting the strcmp
sorting kick in (on the whole lines which compare equal numerically, in this case).
Whether this behavior is POSIX or not, I don't know, but I'm using GNU coreutils 8.32
's sort
. Refer to this question of mine and this answer on Unix for details.
awk
could do all itself, but I think using sort
to sort is more idiomatic (as in, use sort
to sort) and efficient, as explained in a comment (after all, why would you not expect that sort
is the best performing tool in the shell to sort stuff?).
Upvotes: 11
Reputation: 37404
For GNU awk:
$ gawk '{
a[length()][$0]++ # hash to 2d array
}
END {
PROCINFO["sorted_in"]="@ind_num_asc" # first sort on length dim
for(i in a) {
PROCINFO["sorted_in"]="@ind_str_asc" # and then on data dim
for(j in a[i])
for(k=1;k<=a[i][j];k++) # in case there are duplicates
print j
# PROCINFO["sorted_in"]="@ind_num_asc" # I don t think this is needed?
}
}' file
Output:
c
aa
ab
aaaa
aaaaaaaaaa
aaaaaaaaaa
Upvotes: 1
Reputation: 1882
Insert a length for the line using gawk
(zero-filled to four places so it will sort correctly), sort by two keys (first the length, then the first word on the line), then remove the length:
gawk '{printf "%04d %s\n", length($0), $0}' | sort -k1 -k2 | cut -d' ' -f2-
If it must be bash:
while read -r line; do printf "%04d %s\n" ${#line} "${line}"; done | sort -k1 -k2 | (while read -r len remainder; do echo "${remainder}"; done)
Upvotes: 2