user3666956
user3666956

Reputation: 69

awk compare consecutive rows

This is quite simple maybe, but I am stacked.Thanks for any help. I have an input file with two two columns. In one column I have an ID and in the second a value associated to it. I need an output where the first column will be the ID (no repetitions are allowed) and in the second column the average is printed. The ids are not always repeated, and if repeated it could only be consecutively and at a max repetition value of two.

Input

10;10
10;20
20;30
20;40
30;15
40;10
40;12

Desired output

10;15
20;35
30;15
40;11

Upvotes: 0

Views: 662

Answers (3)

cadrian
cadrian

Reputation: 7376

(written live, I did not try it; assume GNU awk; assume sorted input)

awk -F';' '
    BEGIN {
        id=""
    }
    $1 != id {
        if (id != "") {
            printf("%s;%d\n", id, sum/n);
            n = sum = 0;
            id = str($1);
        }
        sum += $2;
        n++;
    }
    END {
        if (n > 0) printf("%s;%s\n", id, sum/n);
    }
'

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203532

$ cat tst.awk
BEGIN { FS=OFS=";" }
($1 != prev) && (NR>1) { print prev, sum/cnt; sum=cnt=0 }
{ prev=$1; sum+=$2; cnt++ }
END { if (cnt) print prev, sum/cnt }

$ awk -f tst.awk file
10;15
20;35
30;15
40;11

Upvotes: 3

Kent
Kent

Reputation: 195059

This one-liner does it:

awk -F';' -v OFS=";" '{a[$1]+=$2+0;b[$1]++}END{for(x in a)print x,a[x]/b[x]}' file

Test with your data:

kent$  cat f
10;10
10;20
20;30
20;40
30;15
40;10
40;12

kent$  awk -F';' -v OFS=";" '{a[$1]+=$2+0;b[$1]++}END{for(x in a)print x,a[x]/b[x]}' f
10;15
20;35
30;15
40;11

Upvotes: 4

Related Questions