Reputation: 1061

cumulative sum of NF-i fields

I have

NC_042565.1  1   1  0  0  1  0  0  1  0  0  0  0  8  3  0  0  0  0  0  0  0  0  0
NC_042565.1  2   2  0  0  3  2  0  1  0  0  0  0  7  1  2  1  0  0  0  0  0  0  0
NC_042565.1  3   2  0  0  3  3  0  1  0  0  0  0  7  1  1  2  0  0  0  0  0  0  0
NC_042565.1  4   2  0  0  3  3  0  1  0  0  0  0  7  1  1  2  0  0  0  0  0  0  0
NC_042565.1  5   3  0  0  3  3  0  1  0  0  0  0  7  1  0  3  0  0  0  0  0  0  0

...

I would like to sum cols 14 to the last, then 15 to the last, then 16 to the last and so on. So that I have

NC_042565.1  1   1  0  0  1  0  0  1  0  0  0  0  11  3  0  0  0  0  0  0  0  0  0
NC_042565.1  2   2  0  0  3  2  0  1  0  0  0  0  11  4  3  1  0  0  0  0  0  0  0
NC_042565.1  3   2  0  0  3  3  0  1  0  0  0  0  11  4  3  2  0  0  0  0  0  0  0
NC_042565.1  4   2  0  0  3  3  0  1  0  0  0  0  11  4  3  2  0  0  0  0  0  0  0
NC_042565.1  5   3  0  0  3  3  0  1  0  0  0  0  11  4  3  3  0  0  0  0  0  0  0

I tried

awk '{for(i=2;i<=NF;i++){$i=$i+$(i-1)}} 1'  file

Calculate cumulative sum of fields within each line

and

awk '{for(i=15;i<=NF;i++){$i+=(s+=$NF>$i)}}1' file

Upvotes: 2

Answers (3)

Renaud Pacalet

Reputation: 29240

This is quite easy:

$ awk '{for(i=NF-1;i>=14;i--)$i+=$(i+1);print}' file
NC_042565.1 1 1 0 0 1 0 0 1 0 0 0 0 11 3 0 0 0 0 0 0 0 0 0
NC_042565.1 2 2 0 0 3 2 0 1 0 0 0 0 11 4 3 1 0 0 0 0 0 0 0
NC_042565.1 3 2 0 0 3 3 0 1 0 0 0 0 11 4 3 2 0 0 0 0 0 0 0
NC_042565.1 4 2 0 0 3 3 0 1 0 0 0 0 11 4 3 2 0 0 0 0 0 0 0
NC_042565.1 5 3 0 0 3 3 0 1 0 0 0 0 11 4 3 3 0 0 0 0 0 0 0

Explanation: we just process fields ($i) in reverse order from $(NF-1) down to $14. We modify $i on the fly by adding the next field ($i+=$(i+1)). As we proceed in reverse order $(i+1) is already the sum of the $(i+1) to $NF original fields. So, when adding $(i+1) to $i, $i becomes itself the sum of the $i to $NF original fields.

Upvotes: 2

RavinderSingh13

Reputation: 133590

With your shown samples, please try following awk program.

awk '
FNR==NR{
  for(i=14;i<=NF;i++){
    arr[FNR,i]=$i
    sum[FNR]+=$i
  }
  next
}
{
  for(i=14;i<=NF;i++){
    diff=0
    if(i>14){
      for(j=14;j<i;j++){
        diff+=arr[FNR,j]
      }
    }
    $i=(sum[FNR]-diff)
   }
}
1
' Input_file Input_file 

NC_042565.1 1 1 0 0 1 0 0 1 0 0 0 0 11 3 0 0 0 0 0 0 0 0 0
NC_042565.1 2 2 0 0 3 2 0 1 0 0 0 0 11 4 3 1 0 0 0 0 0 0 0
NC_042565.1 3 2 0 0 3 3 0 1 0 0 0 0 11 4 3 2 0 0 0 0 0 0 0
NC_042565.1 4 2 0 0 3 3 0 1 0 0 0 0 11 4 3 2 0 0 0 0 0 0 0
NC_042565.1 5 3 0 0 3 3 0 1 0 0 0 0 11 4 3 3 0 0 0 0 0 0 0

Explanation: Adding detailed explanation for above.

awk '                        ##Starting awk program from here.
FNR==NR{                     ##Checking condition if FNR==NR then do following.
  for(i=14;i<=NF;i++){       ##Running for loop from 14th field to last field of line.
    arr[FNR,i]=$i            ##Creating arr with index of FNR and i with value of current field.
    sum[FNR]+=$i             ##Creating sum with index of FNR and keep adding all fields value to it.
  }
  next                       ##next will skip all further statements from here.
}
{
  for(i=14;i<=NF;i++){       ##Running for loop from 14th field to till last field here.
    diff=0                   ##Setting diff to 0 here.
    if(i>14){                ##Checking condition if i>14 then do following.
      for(j=14;j<i;j++){     ##Running for loop from 14 to till j<i here.
        diff+=arr[FNR,j]     ##Creating diff and keep adding each field value which we want to subtract to total sum.
      }
    }
    $i=(sum[FNR]-diff)       ##Setting current field value to sum[FNR](total sum) - diff here.
   }
}
1                            ##Printing current line.
' file1 file1                ##Mentioning Input_file names here.

Upvotes: 2

Shawn

Reputation: 52449

With awk:

$ awk '{
    for (n = 1; n < 15; n++) printf "%s ", $n
    sum = 0
    for (n = 15; n <= NF; n++) sum += $n
    for (n = 15; n < NF; n++) {
      printf "%d ", sum
      sum -= $n
    }
    printf "%d\n", sum
  }' input.txt
NC_042565.1 1 1 0 0 1 0 0 1 0 0 0 0 8 3 0 0 0 0 0 0 0 0 0
NC_042565.1 2 2 0 0 3 2 0 1 0 0 0 0 7 4 3 1 0 0 0 0 0 0 0
NC_042565.1 3 2 0 0 3 3 0 1 0 0 0 0 7 4 3 2 0 0 0 0 0 0 0
NC_042565.1 4 2 0 0 3 3 0 1 0 0 0 0 7 4 3 2 0 0 0 0 0 0 0
NC_042565.1 5 3 0 0 3 3 0 1 0 0 0 0 7 4 3 3 0 0 0 0 0 0 0

The trick here is to sum up all the columns once, and then loop again, subtracting the current column from that sum each time, instead of summing up all remaining columns for each column. Makes it O(N) instead of O(N²).

Upvotes: 2

cumulative sum of NF-i fields

Answers (3)

Related Questions