Utsav
Utsav

Reputation: 8093

Unix how to sort whole row based on a particular match

I have a file like this

$> cat testfile.txt
abc_xyz_2a      foo
dft_pqr_abc_5c  bar
pqr_ijk_1a      alpha
efg_5b          beta
ijk_pqr_5a      gamma
pqr_ijk_1b      alpha

I want to sort the rows based on last value of first column, after last underscore _, like 1a,2a,5a,5b,5c

So this is my expected output.

pqr_ijk_1a      alpha
pqr_ijk_1b      alpha
abc_xyz_2a      foo
ijk_pqr_5a      gamma
efg_5b          beta
dft_pqr_abc_5c  bar

Could someone please suggest a way to achieve the expected output?

What I tried

I have tried extracting the part after last underscore of first column and sort, but that would only print those keywords, not the whole line.

$> awk '{print $1}' testfile.txt|rev|awk -F_ '{print $1}'|rev|sort
1a
2a
5a
5b
5c

I guess there could be a way to hold/note the line numbers somehow and output based on that? I tried some hit and trial using NR in awk unsuccessfully.

Edit: Added a row in file ending with 1b to handle another case. Changed expected output based on it.

Upvotes: 4

Views: 175

Answers (3)

Ajinkya
Ajinkya

Reputation: 83

You can try below command which is much simpler and straight forward.

 rev   test.txt | sort  -k2 | rev


pqr_ijk_1a      alpha
abc_xyz_2a      foo
ijk_pqr_5a      gamma
efg_5b          beta
dft_pqr_abc_5c  bar

Upvotes: 0

riteshtch
riteshtch

Reputation: 8769

Just remove out the required columns, sort it and then remove it again.

$ cat data
abc_xyz_2a      foo
dft_pqr_abc_5c  bar
pqr_ijk_1a      alpha
efg_5b          beta
ijk_pqr_5a      gamma


$ awk '{print substr($1, length($1)-1, 1), substr($1, length($1)), $1, $2}' data | sort -n -k1,2 | awk '{print $3,$4}'
pqr_ijk_1a alpha
abc_xyz_2a foo
ijk_pqr_5a gamma
efg_5b beta
dft_pqr_abc_5c bar

Here is what happens at each step of the pipeline:

$ awk '{print substr($1, length($1)-1, 1), substr($1, length($1)), $1, $2}' data
2 a abc_xyz_2a foo
5 c dft_pqr_abc_5c bar
1 a pqr_ijk_1a alpha
5 b efg_5b beta
5 a ijk_pqr_5a gamma

$ awk '{print substr($1, length($1)-1, 1), substr($1, length($1)), $1, $2}' data | sort -n -k1,2
1 a pqr_ijk_1a alpha
2 a abc_xyz_2a foo
5 a ijk_pqr_5a gamma
5 b efg_5b beta
5 c dft_pqr_abc_5c bar

Upvotes: 2

anubhava
anubhava

Reputation: 785058

If you have gnu-awk then you can use PROCINFO way of sorting an array:

awk 'BEGIN{PROCINFO["sorted_in"] = "@ind_num_asc"} {
   n=split($1, a, "_")
   data[a[n]]=$0
}
END {
   for (i in data)
      print data[i]
}' file

pqr_ijk_1a      alpha
abc_xyz_2a      foo
ijk_pqr_5a      gamma
efg_5b          beta
dft_pqr_abc_5c  bar

Otherwise you can use awk-sort-cut pipeline:

awk '{n=split($1, a, "_"); print $0 "\0" a[n]}' file | sort -t '\0' -k2 | cut -d $'\0' -f1

pqr_ijk_1a      alpha
abc_xyz_2a      foo
ijk_pqr_5a      gamma
efg_5b          beta
dft_pqr_abc_5c  bar

Upvotes: 2

Related Questions