Reputation: 23
I have a text file as below :
info.txt
files-550519470 19h
files-1662192679 1d
files-247106034 1d
files-1986982365 2d
files-464153317 12m
files-739420408 3d
files-77614277 3m
files-374059185 4d
files-909323637 4d
files-101830442 5d
files-1270496134 5d
files-1797797160 6d
files-812888216 7d
files-118869238 7h
I want to sort based on the second column alphabet and after that decreasing order of the number in the same second column and output should look like below :
files-812888216 7d
files-1797797160 6d
files-101830442 5d
files-101830442 5d
files-1270496134 5d
files-374059185 4d
files-909323637 4d
files-374059185 4d
files-909323637 4d
files-739420408 3d
files-1986982365 2d
files-1662192679 1d
files-247106034 1d
files-550519470 19h
files-118869238 7h
files-464153317 12m
files-77614277 3m
I can reverse based on number by below command but can't figure out regarding alphabets . Can somebody please suggest ?
sort -r -nk2 info.txt
Upvotes: 1
Views: 376
Reputation: 103764
Using the Decorate, Sort, Undecorate pattern:
$ sort -t $'-' -k 2 file |
sed -E 's/(.*) ([[:digit:]][[:digit:]]*)([dmh]$)/\2 \3 \1 \2\3/' |
awk 'BEGIN{arr["m"]=1; arr["h"]=60; arr["d"]=60*24}
{$2=$1*arr[$2]; $1=""; print}' |
sort -s -k1nr |
cut -d' ' -f3-
files-812888216 7d
files-1797797160 6d
files-101830442 5d
files-101830442 5d
files-1270496134 5d
files-374059185 4d
files-374059185 4d
files-909323637 4d
files-909323637 4d
files-739420408 3d
files-1986982365 2d
files-1662192679 1d
files-247106034 1d
files-550519470 19h
files-118869238 7h
files-464153317 12m
files-77614277 3m
This should be significantly faster than a Bash loop. It can be further optimized if you have gawk
to replace sort
and sed
If you have GNU or BSD sort, you can take advantage of alphabetically d<h<m
and not do the conversion:
$ sed -E 's/([^-]*)-(.*) ([[:digit:]][[:digit:]]*)([dmh]$)/\2 \4 \3 \1-\2 \3\4/' file |
sort -s -t $' ' -k2,2 -k3,3nr -k1,1 |
cut -d $' ' -f4-
# same output
Upvotes: 2
Reputation: 140960
@edit
Thank you @shelter for help! We can do it in just:
sed 's/\(.*\) \([0-9]*\)\([a-zA-Z]*\)/\3 \2 \1 \2\3/' |
sort -k1 -k2nr |
cut -d' ' -f3-
sed
adds two new columns in front, one with the letter from the 3th column, the second column with the number from the 3th columnI leave the old answer as a reference.
This is my idea, it works, but definitely is not the best:
sed 's/\(.*\) \([0-9]*\)\([a-zA-Z]*\)/\3 \2 \1 \2\3/' |
sort -k1 |
{
presuffix=''
buff=''
while IFS=' ' read -r suffix rest; do
if [ "$presuffix" != "$suffix" ]; then
echo -n "$buff" | sort -n -r -k1
presuffix=$suffix
buff=''
fi
buff+="$rest"$'\n'
done
printf "%s" "$buff" | sort -n -r -k1
} |
cut -d' ' -f2-
1d
on the beginning of the line so the line is prepended with d 1 ... rest of the line
. So the line is prepended with two new columns - one we want to sort alphabetically and the other we want to sort numerically.while read
so it's first column now).cut -d' ' -f2-
removed the first column (number).while read
part, but I have no better idea.@edit:
Another solution really under the influence of @shelter comment.
sed 's/\(.*\) \([0-9]*\)\([a-zA-Z]*\)/\3 \2 \1 \2\3/' |
while IFS=' ' read -r suffix num rest; do
echo "$(printf "%d * 256 + (256 - %d)\n" "'$suffix" "$num" | bc)" "$rest"
done |
sort -r -n |
cut -d' ' -f2-
Assuming there is only a single character suffix in the sorted column (1d
or 1e
or 1h
or 19d
) and the numbers in the sorted column are smaller then 256 (magic number, may be increased), we can convert the character into ascii number.
Then we can multiply the ascii number by 256 and to it add the number within the sorted column. The number is substracted with 256, cause within each chunk we want to reverse sort using numbers (7d
is first, 1d
then). Then we just numerically sort it.
We could alternatively use printf "(256 - %d) + %d"
and then reverse numericall sort, the difference is only when two fields are equal (ex. in case of files-1662192679
and files-247106034
).
The magic number 256
should be greater then the the biggest number in the sorted column and also greater then the biggest ascii representation of the character in the sorted column. Probably this could be extended to handle multiple characters in the sorted column.
Upvotes: 1