Reputation: 9036

Check if two lines start with the same character, if so the output average, if not, print actual value

I'd like to check if two rows start with the same number in the 1st column, if that happens, then the average of the 2nd column should be displayed. File example:

01  21    6    10%       93.3333%
01  22    50   83.3333%  93.3333%
02  20.5  23   18.1102%  96.8504%
02  21.5  100  78.7402%  96.8504%
03  22.2  0    0%        100%
03  21.2  29   100%      100%
04  22.5  1    5.55556%  100%
04  23.5  17   94.4444%  100%
05  22.7  9    7.82609%  100%
05  21.7  106  92.1739%  100%
06  23    11   17.4603%  96.8254%
06  22    50   79.3651%  96.8254%
07  20.5  14   18.6667%  96%
07  21.5  58   77.3333%  96%
08  21.8  4    100%      100%
09  22.6  0    0%        100%
09  21.6  22   100%      100%

For instance, the two first lines start with 01, but there is only one line starting with 08 (15th line). Therefore, the output based on these two cases should be:

01 21.5
...
...
...
08 21.8
...
...
...

I ended up with the following awk line, which works great when the file always has two similar lines, but it fails using the file shown above (because the 15th line):

awk '{sum+=$2} (NR%2)==0{print sum/2; sum=0;}'

Any hint is welcomed,

Upvotes: 6

Answers (4)

Costas

Reputation: 343

awk '
second{
    if($1 == first){
        print (second + $2) / 2
        second = 0
        next
    }
    else
        print second
}
{
    printf "%s ", $1
    fist = $1
    second = $2
}
END{
    if(second)
        print second
}' file

Upvotes: 1

anubhava

Reputation: 784938

This awk should work:

awk 'function dump(){if (n>0) printf "%s%s%.2f\n", p, OFS, sum/n}
     NR>1 && $1 != p{dump(); sum=n=0} {p=$1; sum+=$2; n++} END{dump()}' file
01 21.5
02 21.0
03 21.7
04 23.0
05 22.2
06 22.5
07 21.0
08 21.8
09 22.1

Explanation: We are using 3 variables:

p -> to hold previous row's $1 value
n -> count of similar $1 values
sum -> is sum of $2 values for similar $1 rows

How it works:

NR>1 && $1 != p     # when row #1 > 1 and prev $1 is not current $1
dump()              # function is to print formatted value of $1 and average
p=$1; sum+=$2; n++  # sets p to $1, adds current $2 to sum and increments n

Upvotes: 4

karakfa

Reputation: 67467

awk with piped sort

awk '{s[$1]+=$2;c[$1]++} END{for(i in s) print i, s[i]/c[i]}' file | sort

Upvotes: 1

glenn jackman

Reputation: 246744

Using GNU awk

gawk '
    {sum[$1]+=$2; n[$1]++} 
    END {
        PROCINFO["sorted_in"] = "@ind_num_asc"
        for (key in sum) print key, sum[key]/n[key]
    }
' file

The "PROCINFO" line makes the array traversal sorted my index numerically. Otherwise the output would appear random.

Upvotes: 4

Check if two lines start with the same character, if so the output average, if not, print actual value

Answers (4)

Related Questions