Reputation: 1061
I have
NC_042565.1 1 1 0 0 1 0 0 1 0 0 0 0 8 3 0 0 0 0 0 0 0 0 0
NC_042565.1 2 2 0 0 3 2 0 1 0 0 0 0 7 1 2 1 0 0 0 0 0 0 0
NC_042565.1 3 2 0 0 3 3 0 1 0 0 0 0 7 1 1 2 0 0 0 0 0 0 0
NC_042565.1 4 2 0 0 3 3 0 1 0 0 0 0 7 1 1 2 0 0 0 0 0 0 0
NC_042565.1 5 3 0 0 3 3 0 1 0 0 0 0 7 1 0 3 0 0 0 0 0 0 0
...
I would like to sum cols 14 to the last, then 15 to the last, then 16 to the last and so on. So that I have
NC_042565.1 1 1 0 0 1 0 0 1 0 0 0 0 11 3 0 0 0 0 0 0 0 0 0
NC_042565.1 2 2 0 0 3 2 0 1 0 0 0 0 11 4 3 1 0 0 0 0 0 0 0
NC_042565.1 3 2 0 0 3 3 0 1 0 0 0 0 11 4 3 2 0 0 0 0 0 0 0
NC_042565.1 4 2 0 0 3 3 0 1 0 0 0 0 11 4 3 2 0 0 0 0 0 0 0
NC_042565.1 5 3 0 0 3 3 0 1 0 0 0 0 11 4 3 3 0 0 0 0 0 0 0
I tried
awk '{for(i=2;i<=NF;i++){$i=$i+$(i-1)}} 1' file
Calculate cumulative sum of fields within each line
and
awk '{for(i=15;i<=NF;i++){$i+=(s+=$NF>$i)}}1' file
Upvotes: 2
Views: 163
Reputation: 29240
This is quite easy:
$ awk '{for(i=NF-1;i>=14;i--)$i+=$(i+1);print}' file
NC_042565.1 1 1 0 0 1 0 0 1 0 0 0 0 11 3 0 0 0 0 0 0 0 0 0
NC_042565.1 2 2 0 0 3 2 0 1 0 0 0 0 11 4 3 1 0 0 0 0 0 0 0
NC_042565.1 3 2 0 0 3 3 0 1 0 0 0 0 11 4 3 2 0 0 0 0 0 0 0
NC_042565.1 4 2 0 0 3 3 0 1 0 0 0 0 11 4 3 2 0 0 0 0 0 0 0
NC_042565.1 5 3 0 0 3 3 0 1 0 0 0 0 11 4 3 3 0 0 0 0 0 0 0
Explanation: we just process fields ($i
) in reverse order from $(NF-1)
down to $14
. We modify $i
on the fly by adding the next field ($i+=$(i+1)
). As we proceed in reverse order $(i+1)
is already the sum of the $(i+1)
to $NF
original fields. So, when adding $(i+1)
to $i
, $i
becomes itself the sum of the $i
to $NF
original fields.
Upvotes: 2
Reputation: 133590
With your shown samples, please try following awk
program.
awk '
FNR==NR{
for(i=14;i<=NF;i++){
arr[FNR,i]=$i
sum[FNR]+=$i
}
next
}
{
for(i=14;i<=NF;i++){
diff=0
if(i>14){
for(j=14;j<i;j++){
diff+=arr[FNR,j]
}
}
$i=(sum[FNR]-diff)
}
}
1
' Input_file Input_file
NC_042565.1 1 1 0 0 1 0 0 1 0 0 0 0 11 3 0 0 0 0 0 0 0 0 0
NC_042565.1 2 2 0 0 3 2 0 1 0 0 0 0 11 4 3 1 0 0 0 0 0 0 0
NC_042565.1 3 2 0 0 3 3 0 1 0 0 0 0 11 4 3 2 0 0 0 0 0 0 0
NC_042565.1 4 2 0 0 3 3 0 1 0 0 0 0 11 4 3 2 0 0 0 0 0 0 0
NC_042565.1 5 3 0 0 3 3 0 1 0 0 0 0 11 4 3 3 0 0 0 0 0 0 0
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition if FNR==NR then do following.
for(i=14;i<=NF;i++){ ##Running for loop from 14th field to last field of line.
arr[FNR,i]=$i ##Creating arr with index of FNR and i with value of current field.
sum[FNR]+=$i ##Creating sum with index of FNR and keep adding all fields value to it.
}
next ##next will skip all further statements from here.
}
{
for(i=14;i<=NF;i++){ ##Running for loop from 14th field to till last field here.
diff=0 ##Setting diff to 0 here.
if(i>14){ ##Checking condition if i>14 then do following.
for(j=14;j<i;j++){ ##Running for loop from 14 to till j<i here.
diff+=arr[FNR,j] ##Creating diff and keep adding each field value which we want to subtract to total sum.
}
}
$i=(sum[FNR]-diff) ##Setting current field value to sum[FNR](total sum) - diff here.
}
}
1 ##Printing current line.
' file1 file1 ##Mentioning Input_file names here.
Upvotes: 2
Reputation: 52449
With awk
:
$ awk '{
for (n = 1; n < 15; n++) printf "%s ", $n
sum = 0
for (n = 15; n <= NF; n++) sum += $n
for (n = 15; n < NF; n++) {
printf "%d ", sum
sum -= $n
}
printf "%d\n", sum
}' input.txt
NC_042565.1 1 1 0 0 1 0 0 1 0 0 0 0 8 3 0 0 0 0 0 0 0 0 0
NC_042565.1 2 2 0 0 3 2 0 1 0 0 0 0 7 4 3 1 0 0 0 0 0 0 0
NC_042565.1 3 2 0 0 3 3 0 1 0 0 0 0 7 4 3 2 0 0 0 0 0 0 0
NC_042565.1 4 2 0 0 3 3 0 1 0 0 0 0 7 4 3 2 0 0 0 0 0 0 0
NC_042565.1 5 3 0 0 3 3 0 1 0 0 0 0 7 4 3 3 0 0 0 0 0 0 0
The trick here is to sum up all the columns once, and then loop again, subtracting the current column from that sum each time, instead of summing up all remaining columns for each column. Makes it O(N) instead of O(N²).
Upvotes: 2