crashmaxed
crashmaxed

Reputation: 540

Awk: Removing duplicate lines without sorting after matching conditions

I've got a list of devices which I need to remove duplicates (keep only the first occurrence) while preserving order and matching a condition. In this case I'm looking for a specific string and then printing the field with the device name. Here is some example raw data from the sar application:

10:02:01 AM       sdc      0.70      0.00      8.13     11.62      0.00      1.29      0.86      0.06
10:02:01 AM       sda      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
10:02:01 AM       sdb      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdc      1.31      3.73     99.44     78.46      0.02     17.92      0.92      0.12
Average:          sda      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdb      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
10:05:01 AM       sdc      2.70      0.00     39.92     14.79      0.02      5.95      0.31      0.08
10:05:01 AM       sda      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
10:05:01 AM       sdb      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
10:06:01 AM       sdc      0.83      0.00     10.00     12.00      0.00      0.78      0.56      0.05
11:04:01 AM       sda      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
11:04:01 AM       sdb      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdc      0.70      2.55      8.62     15.91      0.00      1.31      0.78      0.05
Average:          sda      0.12      0.95      0.00      7.99      0.00      0.60      0.60      0.01
Average:          sdb      0.22      1.78      0.00      8.31      0.00      0.54      0.52      0.01

The following will give me the list of devices from lines containing the word "average" but it sorts the output:

sar -dp | awk '/Average/ {devices[$2]} END {for (device in devices) {print device}}'
sda
sdb
sdc

The following gives me exactly what I want (command from here):

sar -dp | awk '/Average/ {print $2}' | awk '!devices[$0]++'
sdc
sda
sdb

Maybe I'm missing something painfully obvious but I can't figure out how to do the same in one awk command, that is without piping the output of the first awk into the second awk.

Upvotes: 1

Views: 389

Answers (2)

Jotne
Jotne

Reputation: 41446

You can do:

sar -dp | awk '/Average/ && !devices[$2]++ {print $2}' 
sdc
sda
sdb

The problem is this part for (device in devices). For some reason the for does randomize the output.
I have read a long complicated information on why some where but have not the link.

Upvotes: 3

Etan Reisner
Etan Reisner

Reputation: 80921

awk '/Average/ && !devices[$2]++ {print $2}' sar.in

You just need to combine the two tests. The only caveat is that in the original the entire line is field two from the original input so you need to replace $0 with $2.

Upvotes: 1

Related Questions