deepa
deepa

Reputation: 23

How do I sort the second column alphabetically and then by numbers in shell script?

I have a text file as below :

info.txt

files-550519470 19h
files-1662192679 1d
files-247106034 1d
files-1986982365 2d
files-464153317 12m
files-739420408 3d
files-77614277 3m
files-374059185 4d
files-909323637 4d
files-101830442 5d
files-1270496134 5d
files-1797797160 6d
files-812888216 7d
files-118869238 7h

I want to sort based on the second column alphabet and after that decreasing order of the number in the same second column and output should look like below :

 files-812888216 7d
 files-1797797160 6d
 files-101830442 5d
 files-101830442 5d
 files-1270496134 5d
 files-374059185 4d
 files-909323637 4d
 files-374059185 4d
 files-909323637 4d
 files-739420408 3d
 files-1986982365 2d
 files-1662192679 1d
 files-247106034 1d
 files-550519470 19h
 files-118869238 7h
 files-464153317 12m
 files-77614277 3m

I can reverse based on number by below command but can't figure out regarding alphabets . Can somebody please suggest ?

 sort -r -nk2 info.txt

Upvotes: 1

Views: 376

Answers (2)

dawg
dawg

Reputation: 103764

Using the Decorate, Sort, Undecorate pattern:

$ sort -t $'-' -k 2 file | 
sed -E 's/(.*) ([[:digit:]][[:digit:]]*)([dmh]$)/\2 \3 \1 \2\3/' | 
awk 'BEGIN{arr["m"]=1; arr["h"]=60; arr["d"]=60*24}
     {$2=$1*arr[$2]; $1=""; print}' | 
sort -s -k1nr |
cut -d' ' -f3-
files-812888216 7d
files-1797797160 6d
files-101830442 5d
files-101830442 5d
files-1270496134 5d
files-374059185 4d
files-374059185 4d
files-909323637 4d
files-909323637 4d
files-739420408 3d
files-1986982365 2d
files-1662192679 1d
files-247106034 1d
files-550519470 19h
files-118869238 7h
files-464153317 12m
files-77614277 3m

This should be significantly faster than a Bash loop. It can be further optimized if you have gawk to replace sort and sed


If you have GNU or BSD sort, you can take advantage of alphabetically d<h<m and not do the conversion:

$ sed -E 's/([^-]*)-(.*) ([[:digit:]][[:digit:]]*)([dmh]$)/\2 \4 \3 \1-\2 \3\4/' file |
sort -s -t $' ' -k2,2 -k3,3nr -k1,1 |
cut -d $' ' -f4-
# same output

Upvotes: 2

KamilCuk
KamilCuk

Reputation: 140960

@edit

Thank you @shelter for help! We can do it in just:

sed 's/\(.*\) \([0-9]*\)\([a-zA-Z]*\)/\3 \2 \1 \2\3/' |
sort -k1 -k2nr |
cut -d' ' -f3-
  1. sed adds two new columns in front, one with the letter from the 3th column, the second column with the number from the 3th column
  2. Then we sort using the first column and second column numerical reverse
  3. Then we removed the extra added columns.

I leave the old answer as a reference.

This is my idea, it works, but definitely is not the best:

sed 's/\(.*\) \([0-9]*\)\([a-zA-Z]*\)/\3 \2 \1 \2\3/' |
sort -k1 | 
{
    presuffix=''
    buff=''
    while IFS=' ' read -r suffix rest; do
        if [ "$presuffix" != "$suffix" ]; then
            echo -n "$buff" | sort -n -r -k1 
            presuffix=$suffix
            buff=''
        fi
        buff+="$rest"$'\n'
    done
    printf "%s" "$buff" | sort -n -r -k1
} |
cut -d' ' -f2-
  1. The sed get's the 1d on the beginning of the line so the line is prepended with d 1 ... rest of the line. So the line is prepended with two new columns - one we want to sort alphabetically and the other we want to sort numerically.
  2. Then we sort using the first column (alphabet).
  3. Then I split the stream into separate parts using buffer and reverse sort each part using the second field (number) (the first field get's removed in the while read so it's first column now).
  4. Then the cut -d' ' -f2- removed the first column (number).
  5. This will be slow because of the while read part, but I have no better idea.

@edit:

Another solution really under the influence of @shelter comment.

sed 's/\(.*\) \([0-9]*\)\([a-zA-Z]*\)/\3 \2 \1 \2\3/' |
while IFS=' ' read -r suffix num rest; do
    echo "$(printf "%d * 256 + (256 - %d)\n" "'$suffix" "$num" | bc)" "$rest"
done |
sort -r -n |
cut -d' ' -f2-

Assuming there is only a single character suffix in the sorted column (1d or 1e or 1h or 19d) and the numbers in the sorted column are smaller then 256 (magic number, may be increased), we can convert the character into ascii number.

Then we can multiply the ascii number by 256 and to it add the number within the sorted column. The number is substracted with 256, cause within each chunk we want to reverse sort using numbers (7d is first, 1d then). Then we just numerically sort it.

We could alternatively use printf "(256 - %d) + %d" and then reverse numericall sort, the difference is only when two fields are equal (ex. in case of files-1662192679 and files-247106034).

The magic number 256 should be greater then the the biggest number in the sorted column and also greater then the biggest ascii representation of the character in the sorted column. Probably this could be extended to handle multiple characters in the sorted column.

Upvotes: 1

Related Questions