Reputation: 69
This is quite simple maybe, but I am stacked.Thanks for any help. I have an input file with two two columns. In one column I have an ID and in the second a value associated to it. I need an output where the first column will be the ID (no repetitions are allowed) and in the second column the average is printed. The ids are not always repeated, and if repeated it could only be consecutively and at a max repetition value of two.
Input
10;10
10;20
20;30
20;40
30;15
40;10
40;12
Desired output
10;15
20;35
30;15
40;11
Upvotes: 0
Views: 662
Reputation: 7376
(written live, I did not try it; assume GNU awk; assume sorted input)
awk -F';' '
BEGIN {
id=""
}
$1 != id {
if (id != "") {
printf("%s;%d\n", id, sum/n);
n = sum = 0;
id = str($1);
}
sum += $2;
n++;
}
END {
if (n > 0) printf("%s;%s\n", id, sum/n);
}
'
Upvotes: 0
Reputation: 203532
$ cat tst.awk
BEGIN { FS=OFS=";" }
($1 != prev) && (NR>1) { print prev, sum/cnt; sum=cnt=0 }
{ prev=$1; sum+=$2; cnt++ }
END { if (cnt) print prev, sum/cnt }
$ awk -f tst.awk file
10;15
20;35
30;15
40;11
Upvotes: 3
Reputation: 195059
This one-liner does it:
awk -F';' -v OFS=";" '{a[$1]+=$2+0;b[$1]++}END{for(x in a)print x,a[x]/b[x]}' file
Test with your data:
kent$ cat f
10;10
10;20
20;30
20;40
30;15
40;10
40;12
kent$ awk -F';' -v OFS=";" '{a[$1]+=$2+0;b[$1]++}END{for(x in a)print x,a[x]/b[x]}' f
10;15
20;35
30;15
40;11
Upvotes: 4