Lito
Lito

Reputation: 1332

Sorting list in bash by value from keyvalue pair key=value

I have a requests log like this:

[11/Jun/2020:15:35:20 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=72161.647 memory=2 cpu=0.01%
[11/Jun/2020:15:22:13 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=70564.992 memory=2 cpu=0.00%
[11/Jun/2020:15:35:26 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=70252.369 memory=2 cpu=0.00%
[11/Jun/2020:15:01:02 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=60159.409 memory=2 cpu=0.03%
[11/Jun/2020:14:59:03 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=106956.770 memory=2 cpu=0.01%
[11/Jun/2020:15:37:56 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=60014.014 memory=2 cpu=0.00%
[11/Jun/2020:16:45:38 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=61264.044 memory=2 cpu=0.02%
[11/Jun/2020:15:01:48 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=58733.325 memory=2 cpu=0.02%
[11/Jun/2020:15:31:35 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=68882.501 memory=2 cpu=0.03%
[11/Jun/2020:14:59:46 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=57021.375 memory=2 cpu=0.00%
[11/Jun/2020:14:59:46 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=137172.179 memory=2 cpu=0.01%
[11/Jun/2020:15:35:39 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=107954.112 memory=2 cpu=0.00%
[11/Jun/2020:16:12:22 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=55877.479 memory=2 cpu=0.02%
[11/Jun/2020:15:26:19 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=55912.678 memory=2 cpu=0.00%
[11/Jun/2020:15:36:33 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=54738.373 memory=2 cpu=0.02%

And I have a script to sort by time, memory and cpu, but I can do it only if I remove the static string time= before sort.

cat /var/log/requests.log | sed -e "s/time=//" | sort -k 7 -n -r | head -50

I get

[11/Jun/2020:14:59:46 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 137172.179 memory=2 cpu=0.01%
[11/Jun/2020:15:35:39 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 107954.112 memory=2 cpu=0.00%
[11/Jun/2020:14:59:03 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 106956.770 memory=2 cpu=0.01%
[11/Jun/2020:15:35:20 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 72161.647 memory=2 cpu=0.01%
[11/Jun/2020:15:22:13 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 70564.992 memory=2 cpu=0.00%
[11/Jun/2020:15:35:26 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 70252.369 memory=2 cpu=0.00%
[11/Jun/2020:15:31:35 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 68882.501 memory=2 cpu=0.03%
[11/Jun/2020:16:45:38 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 61264.044 memory=2 cpu=0.02%
[11/Jun/2020:15:01:02 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 60159.409 memory=2 cpu=0.03%
[11/Jun/2020:15:37:56 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 60014.014 memory=2 cpu=0.00%
[11/Jun/2020:15:01:48 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 58733.325 memory=2 cpu=0.02%
[11/Jun/2020:14:59:46 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 57021.375 memory=2 cpu=0.00%
[11/Jun/2020:15:26:19 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 55912.678 memory=2 cpu=0.00%
[11/Jun/2020:16:12:22 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 55877.479 memory=2 cpu=0.02%
[11/Jun/2020:15:47:01 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX 55443.752 memory=2 cpu=0.02%

I want to sort the list without removing the sort key.

[11/Jun/2020:14:59:46 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=137172.179 memory=2 cpu=0.01%
[11/Jun/2020:15:35:39 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=107954.112 memory=2 cpu=0.00%
[11/Jun/2020:14:59:03 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=106956.770 memory=2 cpu=0.01%
[11/Jun/2020:15:35:20 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=72161.647 memory=2 cpu=0.01%
[11/Jun/2020:15:22:13 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=70564.992 memory=2 cpu=0.00%
[11/Jun/2020:15:35:26 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=70252.369 memory=2 cpu=0.00%
[11/Jun/2020:15:31:35 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=68882.501 memory=2 cpu=0.03%
[11/Jun/2020:16:45:38 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=61264.044 memory=2 cpu=0.02%
[11/Jun/2020:15:01:02 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=60159.409 memory=2 cpu=0.03%
[11/Jun/2020:15:37:56 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=60014.014 memory=2 cpu=0.00%
[11/Jun/2020:15:01:48 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=58733.325 memory=2 cpu=0.02%
[11/Jun/2020:14:59:46 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=57021.375 memory=2 cpu=0.00%
[11/Jun/2020:15:26:19 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=55912.678 memory=2 cpu=0.00%
[11/Jun/2020:16:12:22 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=55877.479 memory=2 cpu=0.02%
[11/Jun/2020:15:47:01 +0000] 200 GET /endpoint ip=XXX.XXX.XXX.XXX time=55443.752 memory=2 cpu=0.02%

I have tried with, but no success:

cat /var/log/requests.log | sort -k 7.6 -n -r | head -50

Update: /endpoint are real endpoints, then they can include query string. Update 2: I need to sort for any of key=value column (as number).

Upvotes: 1

Views: 174

Answers (1)

tripleee
tripleee

Reputation: 189377

If your input is properly representative, you can simply use = as the column separator instead.

sort -t = -k3 -k4 -k5 -n -r /var/log/requests.log

Notice also how we avoid the useless cat.

More generally, you could use a simple Awk script to extract the sort fields and put them first, then sort on those, then discard them (known as the Schwartzian transform).

awk '{ for(i=1; i<=NF; ++i) if ($i ~ /^(time|memory|cpu)=/) {
        split($i, f, "="); a[f[1]] = substr($i, length(f[1])+2) }
    print a["time"] "\t" a["memory"] "\t" a["cpu"] "\t" $0 }' /var/log/requests.log |
sort -r -n |
cut -f4-

The if statement pulls out any field which contains a prefix we are interested in (you can add more keys here if you like, or switch to a more general regular expression if you want to extract everything which contains an equals sign after a sequence of alphabetics, for example) and populates the associative array a with their respective values. Once we have looped over all the fields, we extract the values from the array in the order we wish to use for sorting.

Demo: https://ideone.com/dU9v95

Upvotes: 3

Related Questions