Reputation: 123
I want to sort a file in Unix and for that I am using command
sort file --field-separator=' ' --key=7,7
But position of this field is not fixed, sometimes it can be 7th field or sometimes 6th or 8th field in the line.
Do we know if its possible to sort the file based on field name, something like
sort file --field-separator=' ' --keyname=<my_unique_id>
File looks something like this, I want to sort on the basis of party_id
status_date="2000-01-31" ref_date="2021-03-31" ead_percent="0.00365316" accounting_standard="IFRS" party_default_status_cd="NOTDFLT" party_id="36113477" v_src_system_id="ABC"
status_date="2002-12-31" ref_date="2021-03-31" ead_percent="1" accounting_standard="IFRS" orig_src_system_id="GRD" party_default_status_cd="UNLIKE" party_id="36053415" v_src_system_id="XYZ"
Upvotes: 0
Views: 554
Reputation: 570
With GNU awk
(gawk
) one may specify how arrays are traversed. The following saves each line in an array using party_id=XYZ
as respective index and then returns the array sorted by said indices. Limited by RAM for very large files.
awk '{match($0,/party_id=[^ ]*/,$0,id) ; arr[id[0]]=$0}
END {PROCINFO["sorted_in"]="@ind_str_asc"
for (i in arr) {print arr[i]}
}' infile.txt
Upvotes: 0
Reputation: 203324
Using the Decorate/Sort/Undecorate idiom and assuming that, like in the example you provided, your quoted strings don't contain blanks, =
, or "
:
Decorate:
$ awk -F'[ ="]+' -v OFS='\t' -v keyname='party_id' '{for (i=1; i<NF; i+=2) if ($i == keyname) { print $(i+1), $0; next} }' file
36113477 status_date="2000-01-31" ref_date="2021-03-31" ead_percent="0.00365316" accounting_standard="IFRS" party_default_status_cd="NOTDFLT" party_id="36113477" v_src_system_id="ABC"
36053415 status_date="2002-12-31" ref_date="2021-03-31" ead_percent="1" accounting_standard="IFRS" orig_src_system_id="GRD" party_default_status_cd="UNLIKE" party_id="36053415" v_src_system_id="XYZ"
Decorate then Sort:
$ awk -F'[ ="]+' -v OFS='\t' -v keyname='party_id' '{for (i=1; i<NF; i+=2) if ($i == keyname) { print $(i+1), $0; next} }' file |
sort -k1,1n
36053415 status_date="2002-12-31" ref_date="2021-03-31" ead_percent="1" accounting_standard="IFRS" orig_src_system_id="GRD" party_default_status_cd="UNLIKE" party_id="36053415" v_src_system_id="XYZ"
36113477 status_date="2000-01-31" ref_date="2021-03-31" ead_percent="0.00365316" accounting_standard="IFRS" party_default_status_cd="NOTDFLT" party_id="36113477" v_src_system_id="ABC"
Decorate then Sort then Undecorate:
$ awk -F'[ ="]+' -v OFS='\t' -v keyname='party_id' '{for (i=1; i<NF; i+=2) if ($i == keyname) { print $(i+1), $0; next} }' file |
sort -k1,1n |
cut -f2-
status_date="2002-12-31" ref_date="2021-03-31" ead_percent="1" accounting_standard="IFRS" orig_src_system_id="GRD" party_default_status_cd="UNLIKE" party_id="36053415" v_src_system_id="XYZ"
status_date="2000-01-31" ref_date="2021-03-31" ead_percent="0.00365316" accounting_standard="IFRS" party_default_status_cd="NOTDFLT" party_id="36113477" v_src_system_id="ABC"
Upvotes: 2
Reputation: 189367
sort
doesn't have a concept of named keys, but you can perform a Schwartzian transform to temporarily add the key as a prefix to the line, sort on the first field, then discard it.
sed 's/\(.*\)\(party_id="[^"]*"\)/\2 \1\2/' file |
sort -t ' ' -k1,1 |
cut -f2-
(where the whitespace between the two first back references and in the sort -t
argument is a literal tab, which however Stack Overflow renders as a sequence of spaces).
Upvotes: 2