Reputation: 1281
I have 10 files (1Gb each). The contents of the files are as follows:
head -10 part-r-00000
a a a c b 1
a a a dumbbell 1
a a a f a 1
a a a general i 2
a a a glory 2
a a a h d 1
a a a h o 4
a a a h z 1
a a a hem hem 1
a a a k 3
I need to sort the file based on the last column of each line (descending order), which is of variable length. If there is a match on the numerical value then sort alphabetically by the 2nd last column. The following BASH command works on small datasets (not complete files) and takes 3 second to sort only 10 lines from one file.
cat part-r-00000 | awk '{print $NF,$0}' | sort -nr | cut -f2- -d' ' > FILE
I want the output in a separate FILE
. Can someone help me out to speed up the process?
Upvotes: 1
Views: 126
Reputation: 25083
You can use a Schwartzian transform to accomplish your task,
awk '{print -$NF, $(NF-1), $0}' input_file | sort -n | cut -d' ' -f3-
The awk
command prepends each record with the negative of the last field and the second last field.
The sort -n
command sorts the record stream in the required order because we used the negative of the last field.
The cut
command splits on spaces and cuts the first two fields, i.e., the ones we used to normalize the sort
$ echo 'a a a c b 1
a a a dumbbell 1
a a a f a 1
a a a general i 2
a a a glory 2
a a a h d 1
a a a h o 4
a a a h z 1
a a a hem hem 1
a a a k 3' | awk '{print -$NF, $(NF-1), $0}' | sort -n | cut -d' ' -f3-
a a a h o 4
a a a k 3
a a a glory 2
a a a general i 2
a a a f a 1
a a a c b 1
a a a h d 1
a a a dumbbell 1
a a a hem hem 1
a a a h z 1
$
Upvotes: 1
Reputation: 88776
Reverse order, sort and reverse order:
awk '{for (i=NF;i>0;i--){printf "%s ",$i};printf "\n"}' file | sort -nr | awk '{for (i=NF;i>0;i--){printf "%s ",$i};printf "\n"}'
Output:
a a a h o 4 a a a k 3 a a a general i 2 a a a glory 2 a a a h z 1 a a a hem hem 1 a a a dumbbell 1 a a a h d 1 a a a c b 1 a a a f a 1
Upvotes: 1
Reputation: 204164
No, once you get rid of the UUOC that's as fast as it's going to get. Obviously you need to add the 2nd-last field to everything too, e.g. something like:
awk '{print $NF,$(NF-1),$0}' part-r-00000 | sort -k1,1nr -k2,2 | cut -f3- -d' '
Check the sort args, I always get mixed up with those..
Upvotes: 2