stevo
stevo

Reputation: 181

Sort numerically by part of a string

I have a list of strings that I need to sort by a number within the strings e.g.

<sbb part="611-0068-01" desc="21.6TB HDD  2.5" qty="1"/>
<sbb desc="19.2TB SSD/2.5in" part="611-0112-01" qty="1"/>
<sbb part="611-0112-01" qty="1" desc="19.2TB SSD/2.5in"/>
<sbb part="611-0112-02" desc="19.2TB SSD/2.5in" qty="1"/>
<sbb part="611-0044-01" qty="1" desc="4.8TB SSD  2.5"/>
<sbb part="611-0044-03" desc="4.8TB SSD  2.5" qty="1"/>
<sbb desc="9.6T SSD/2.5in" part="611-0202-01" qty="1" />

The part I want to sort by is the XXXX in "611-XXXX-XX" in is in quotes in the strings e.g. 611-1111-03 is lower in number than 611-2222-02 as 1111 is lower than 2222.

All strings contain this 611-XXXX-XX number and this number always starts with 611.

This number can occur near the start of the string or near the end. Unfortunately there are two other sets of quotes in the string which makes this more complex.

Output for this example:

<sbb part="611-0044-01" qty="1" desc="4.8TB SSD  2.5"/>
<sbb part="611-0044-03" desc="4.8TB SSD  2.5" qty="1"/>
<sbb part="611-0068-01" desc="21.6TB HDD  2.5" qty="1"/>
<sbb desc="19.2TB SSD/2.5in" part="611-0112-01" qty="1"/>
<sbb part="611-0112-01" qty="1" desc="19.2TB SSD/2.5in"/>
<sbb part="611-0112-02" desc="19.2TB SSD/2.5in" qty="1"/>
<sbb desc="9.6T SSD/2.5in" part="611-0202-01" qty="1" />

I was thinking of searching from 611 up to the next quote. Not sure how to code that up though as I'm a bash newbie.

Upvotes: 1

Views: 99

Answers (2)

Michael Back
Michael Back

Reputation: 1871

Here is an awk script that results in a solution that is much faster than one that pipes several tools together.

awk 'BEGIN { split("", r); n=0} /part="611-/ { x=$0; sub(/.*part="611-/, "", x); sub(/".*/, "", x); r[++n]=x "," $0; } END { asort(r); for (i=1; i<=n; i++) { x=r[i]; sub(/^[^,]+,/, "", x); print x }'
  1. Filter for the part number of interest
  2. Isolate the relevant section of the part number, tack it on the front of the record as a sorting key and save into array.
  3. At END, sort the array, remove the sorting key and print.

Upvotes: 0

Kent
Kent

Reputation: 195079

I come up with this line:

 awk '{t=$0;sub(/.*"611-/,"");sub(/-/,"");sub(/".*/,"");
      print "1"$0"\x99"t}' file|sort -n|sed 's/.*\x99//'  

output is:

<sbb part="611-0044-01" qty="1" desc="4.8TB SSD  2.5"/>
<sbb part="611-0044-03" desc="4.8TB SSD  2.5" qty="1"/>
<sbb part="611-0068-01" desc="21.6TB HDD  2.5" qty="1"/>
<sbb desc="19.2TB SSD/2.5in" part="611-0112-01" qty="1"/>
<sbb part="611-0112-01" qty="1" desc="19.2TB SSD/2.5in"/>
<sbb part="611-0112-02" desc="19.2TB SSD/2.5in" qty="1"/>
<sbb desc="9.6T SSD/2.5in" part="611-0202-01" qty="1" />

Idea is:

  • extracting the target numbers, put them as 1st column (the awk part)
  • handing this content to sort -n over, let it do the sort
  • finally, remove the 1st column.
  • Note that, I used \x99, to separate the 1st column and the original data, it is an invisible separator, to make it easier to be deleted later.

Upvotes: 3

Related Questions