bykubyk
bykubyk

Reputation: 3

UNIX group by two values

I have a file with the following lines (values are separated by ";"):

dev_name;dev_type;soft
name1;ASR1;11.1
name2;ASR1;12.2
name3;ASR1;11.1
name4;ASR3;15.1

I know how to group them by one value, like count of all ASRx, but how can I group it by two values, as for example:

ASR1
    *11.1 - 2
    *12.2 - 1
ASR3 
    *15.1 - 1

Upvotes: 0

Views: 151

Answers (6)

Paul Hodges
Paul Hodges

Reputation: 15368

I don't want to encourage lazy questions, but I wrote a solution, and I'm sure someone can point out improvements. I love posting answers on this site because I learn so much. :)

One binary subcall to sort, otherwise all built-in processing. That means using read, which is slow. If your file is large, I'd recommend rewriting the loop in awk or perl, but this will get the job done.

sed 1d groups |                        # strip the header
  sort -t';' -k2,3 > group.srt         # pre-sort to collect groupings
declare -i ctr=0                       # initialize integer record counter
IFS=';' read x lastA lastB < group.srt # priming read for comparators
printf "$lastA\n\t*$lastB - "          # priming print (assumes at least one record)
while IFS=';' read x a b               # loop through the file
do if [[ "$lastA" < "$a" ]]            # on every MAJOR change
   then printf "$ctr\n$a\n\t*$b - "    # print total, new MAJOR header and MINOR header
        lastA="$a"                     # update the MAJOR comparator
        lastB="$b"                     # update the MINOR comparator
        ctr=1                          # reset the counter
   elif [[ "$lastB" < "$b" ]]          # on every MINOR change
   then printf "$ctr\n\t*$b - "        # print total and MINOR header
        ctr=1                          # reset the counter
   else (( ctr++ ))                    # otherwise increment
   fi
done < group.srt                       # feed read from sorted file
printf "$ctr\n"                        # print final group total at EOF

Upvotes: 0

Shawn
Shawn

Reputation: 52529

Yet Another Solution, this one using the always useful GNU datamash to count the groups:

$ datamash -t ';' --header-in -sg 2,3 count 3 < input.txt |
   awk -F';' '$1 != curr { curr = $1; print $1 } { print "\t*" $2 " - " $3 }' 
ASR1
    *11.1 - 2
    *12.2 - 1
ASR3
    *15.1 - 1

Upvotes: 0

karakfa
karakfa

Reputation: 67507

another awk

$ awk -F';' 'NR>1 {a[$2]; b[$3]; c[$2,$3]++} 
             END  {for(k in a) {print k; 
                                for(p in b) 
                                   if(c[k,p]) print "\t*"p,"-",c[k,p]}}' file
ASR1
        *11.1 - 2
        *12.2 - 1
ASR3
        *15.1 - 1

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203985

$ cat tst.awk
BEGIN { FS=";"; OFS=" - " }
NR==1 { next }
$2 != prev { prt(); prev=$2 }
{ cnt[$3]++ }
END { prt() }

function prt(   soft) {
    if ( prev != "" ) {
        print prev
        for (soft in cnt) {
            print "    *" soft, cnt[soft]
        }
        delete cnt
    }
}

$ awk -f tst.awk file
ASR1
    *11.1 - 2
    *12.2 - 1
ASR3
    *15.1 - 1

Or if you like pipes....

$ tail +2 file | cut -d';' -f2- | sort | uniq -c |
    awk -F'[ ;]+' '{print ($3!=prev ? $3 ORS : "") "    *" $4 " - " $2; prev=$3}'
ASR1
    *11.1 - 2
    *12.2 - 1
ASR3
    *15.1 - 1

Upvotes: 0

stack0114106
stack0114106

Reputation: 8711

Using Perl

$ cat bykub.txt
dev_name;dev_type;soft
name1;ASR1;11.1
name2;ASR1;12.2
name3;ASR1;11.1
name4;ASR3;15.1
$ perl -F";" -lane ' $kv{$F[1]}{$F[2]}++ if $.>1;END { while(($x,$y) = each(%kv)) { print $x;while(($p,$q) = each(%$y)){ print "\t\*$p - $q" }}}' bykub.txt
ASR1
        *11.1 - 2
        *12.2 - 1
ASR3
        *15.1 - 1
$

Upvotes: 0

NeronLeVelu
NeronLeVelu

Reputation: 10039

try something like

awk -F ';' '
   NR==1{next}
   {aRaw[$2"-"$3]++}
   END {
      asorti( aRaw, aVal)
      for( Val in aVal) {
         split( aVal [Val], aTmp, /-/ )
         if ( aTmp[1] != Last ) { Last = aTmp[1]; print Last }
         print "   " aTmp[2] " " aRaw[ aVal[ Val] ]
         }
      }
   ' YourFile

key here is to use 2 field in a array. The END part is more difficult to present the value than the content itself

Upvotes: 0

Related Questions