Reputation: 353
I have a tab-separated file I need to order by the length of the first field. I've found samples of a line that should do that for me, but it's giving very strange results:
awk -F\t '{print length($1) " " $0|"sort -rn"}' SpanishGlossary.utf8 | sed 's/^.[^>]*>/>/' > test.tmp
... gives this (several representative samples -- it's a very long file):
56 cafés especiales y orgánicos special and organic coffees
56 amplia experiencia gerencial broad managerial experience
55 una fundada confianza en que a well-founded confidence that
55 Servicios de Desarrollo Empresarial Business Development Services
...
6 son estas are these
6 son entregadas a are given to
6 son determinantes para are crucial for
6 son autolimitativos are self-limiting
...
0 tal grado de such a degree of
0 tales such
0 tales propósitos such purposes
0 tales principios such principles
0 tales o cuales this or that
That leading number should be the length of the first field, but it's obviously not. I don't know what that's counting.
What am I doing wrong? Thanks.
Upvotes: 0
Views: 268
Reputation: 195239
try this:
awk '$0=length($1) FS $0' file | sort -nr | sed -r 's/^\S*\s//'
test:
kent$ cat f
as foo
a foo
aaa foo
aaaaa foo
aaaa foo
kent$ awk '$0=length($1) FS $0' f|sort -nr|sed -r 's/^\S*\s//'
aaaaa foo
aaaa foo
aaa foo
as foo
a foo
here I used space(default) as awk's FS
, if you need the tab
, add -F'\t'
add one awk (gnu awk) only one-liner for @Jaypal,
I mentioned gawk, because it has asort and asorti which we could use for sorting.
also I changed the input file to add some same length ($1
) lines.
better "@val_num_asc"
or desc
in asorti(a,b,"...")
kent$ cat f
as foo
a foo
aaa foo
ccc foo
aaaaa foo
bbbbb foo
aaaa foo
kent$ awk '{a[length($1)"."NR]=$0}END{asorti(a,b);for(i=NR;i>0;i--)print a[b[i]]}' f
bbbbb foo
aaaaa foo
aaaa foo
ccc foo
aaa foo
as foo
a foo
Upvotes: 4