Node.JS
Node.JS

Reputation: 1570

uniq -cd but as percentage

I have a file containing these lines:

"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"

I was wondering if is there a Unix way to get a histogram percentage of these lines based on how many times it's repeated. This is my attempt:

sort bmc-versions.txt | uniq -cd
    321 "RedfishVersion":"1.0.0"
     19 "RedfishVersion":"1.0.2"

I want output like this:

"1.0.0"  50%
"1.0.2"  40%

Upvotes: 0

Views: 225

Answers (1)

jared_mamrot
jared_mamrot

Reputation: 26484

Sorted by percentage (highest first) using GNU awk:

awk 'BEGIN{FS=":"; PROCINFO["sorted_in"] = "@val_num_desc"} {a[$2]++} END{for (i in a) {print i "  " int(a[i] / NR * 100 + 0.5) "%"}}' test.txt
"1.15.0"  54 %
"1.6.0"  46 %

Nicer formatting:

awk 'BEGIN {
    FS = ":"
    PROCINFO["sorted_in"] = "@val_num_desc"
}

{
    a[$2]++
}

END {
    for (i in a) {
        print i "  " int(a[i] / NR * 100 + 0.5) "%"
    }
}' test.txt
"1.15.0"  54 %
"1.6.0"  46 %

Sorted by percentage (highest first) using 'non-GNU' awk (e.g. posix awk):

awk 'BEGIN{FS=":"} {a[$2]++} END{for (i=NR; i>=0; i--) {for (h in a) {if(a[h] == i) {print h, int(a[h] / NR * 100 + 0.5), "%"}}}}' test.txt
"1.15.0" 54 %
"1.6.0" 46 %

Nicer formatting:

awk 'BEGIN {
    FS = ":"
}

{
    a[$2]++
}

END {
    for (i = NR; i >= 0; i--) {
        for (h in a) {
            if (a[h] == i) {
                print h, int(a[h] / NR * 100 + 0.5), "%"
            }
        }
    }
}' test.txt
"1.15.0" 54 %
"1.6.0" 46 %

Upvotes: 3

Related Questions