Neha
Neha

Reputation: 123

How to sort a file based on key name instead of its position in unix?

I want to sort a file in Unix and for that I am using command

sort file --field-separator=' ' --key=7,7

But position of this field is not fixed, sometimes it can be 7th field or sometimes 6th or 8th field in the line.

Do we know if its possible to sort the file based on field name, something like

sort file --field-separator=' ' --keyname=<my_unique_id>

File looks something like this, I want to sort on the basis of party_id

status_date="2000-01-31" ref_date="2021-03-31" ead_percent="0.00365316" accounting_standard="IFRS" party_default_status_cd="NOTDFLT" party_id="36113477" v_src_system_id="ABC"
status_date="2002-12-31" ref_date="2021-03-31" ead_percent="1" accounting_standard="IFRS" orig_src_system_id="GRD" party_default_status_cd="UNLIKE" party_id="36053415" v_src_system_id="XYZ"

Upvotes: 0

Views: 554

Answers (3)

FelixJN
FelixJN

Reputation: 570

With GNU awk (gawk) one may specify how arrays are traversed. The following saves each line in an array using party_id=XYZ as respective index and then returns the array sorted by said indices. Limited by RAM for very large files.

awk '{match($0,/party_id=[^ ]*/,$0,id) ; arr[id[0]]=$0}
     END {PROCINFO["sorted_in"]="@ind_str_asc"
          for (i in arr) {print arr[i]}
     }' infile.txt 

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203324

Using the Decorate/Sort/Undecorate idiom and assuming that, like in the example you provided, your quoted strings don't contain blanks, =, or ":

Decorate:

$ awk -F'[ ="]+' -v OFS='\t' -v keyname='party_id' '{for (i=1; i<NF; i+=2) if ($i == keyname) { print $(i+1), $0; next} }' file
36113477        status_date="2000-01-31" ref_date="2021-03-31" ead_percent="0.00365316" accounting_standard="IFRS" party_default_status_cd="NOTDFLT" party_id="36113477" v_src_system_id="ABC"
36053415        status_date="2002-12-31" ref_date="2021-03-31" ead_percent="1" accounting_standard="IFRS" orig_src_system_id="GRD" party_default_status_cd="UNLIKE" party_id="36053415" v_src_system_id="XYZ"

Decorate then Sort:

$ awk -F'[ ="]+' -v OFS='\t' -v keyname='party_id' '{for (i=1; i<NF; i+=2) if ($i == keyname) { print $(i+1), $0; next} }' file |
    sort -k1,1n
36053415        status_date="2002-12-31" ref_date="2021-03-31" ead_percent="1" accounting_standard="IFRS" orig_src_system_id="GRD" party_default_status_cd="UNLIKE" party_id="36053415" v_src_system_id="XYZ"
36113477        status_date="2000-01-31" ref_date="2021-03-31" ead_percent="0.00365316" accounting_standard="IFRS" party_default_status_cd="NOTDFLT" party_id="36113477" v_src_system_id="ABC"

Decorate then Sort then Undecorate:

$ awk -F'[ ="]+' -v OFS='\t' -v keyname='party_id' '{for (i=1; i<NF; i+=2) if ($i == keyname) { print $(i+1), $0; next} }' file |
    sort -k1,1n |
    cut -f2-
status_date="2002-12-31" ref_date="2021-03-31" ead_percent="1" accounting_standard="IFRS" orig_src_system_id="GRD" party_default_status_cd="UNLIKE" party_id="36053415" v_src_system_id="XYZ"
status_date="2000-01-31" ref_date="2021-03-31" ead_percent="0.00365316" accounting_standard="IFRS" party_default_status_cd="NOTDFLT" party_id="36113477" v_src_system_id="ABC"

Upvotes: 2

tripleee
tripleee

Reputation: 189367

sort doesn't have a concept of named keys, but you can perform a Schwartzian transform to temporarily add the key as a prefix to the line, sort on the first field, then discard it.

sed 's/\(.*\)\(party_id="[^"]*"\)/\2    \1\2/' file |
sort -t '   ' -k1,1 |
cut -f2-

(where the whitespace between the two first back references and in the sort -t argument is a literal tab, which however Stack Overflow renders as a sequence of spaces).

Upvotes: 2

Related Questions