Nikola
Nikola

Reputation: 17

Skip row value(s) with awk

I have the following input file:

 -0.805813  0.874753 -0.776101 -0.749147 -0.636834  0.379035 -0.004061 -0.004061
 -0.426119 -0.024801 -0.041989 -0.783686  0.361837  0.055206  0.368603  0.147965
 -0.632526 -0.100358  0.847947 -0.690233 -0.996141  0.445275  1.086014 -1.097968
  0.411383  0.411383 -0.734988  0.344954  2.577123 -0.372104 -0.923401  0.302907
  0.302907 -1.424862  1.165900 -0.776100 -0.776100 -0.495400  0.182533  0.002356
  0.002356  0.002356

I used awk to calculate the sum of these values in a sequential order (sum = -3.0000):

awk '{ for (i=1; i<=NF; i++) sum += $i } END { printf("%3.4f", sum) }' input.txt

Is there any possibility to use awk to skip values in a sequential order starting from the last line and to calculate sum for the rest of the values? For instance:

 -0.805813  0.874753 -0.776101 -0.749147 -0.636834  0.379035 -0.004061 -0.004061
 -0.426119 -0.024801 -0.041989 -0.783686  0.361837  0.055206  0.368603  0.147965
 -0.632526 -0.100358  0.847947 -0.690233 -0.996141  0.445275  1.086014 -1.097968
  0.411383  0.411383 -0.734988  0.344954  2.577123 -0.372104 -0.923401  0.302907
  0.302907 -1.424862  **1.165900 -0.776100 -0.776100 -0.495400  0.182533  0.002356
  0.002356  0.002356**

where I want to skip the values between the stars (sum = -2.3079). The number of values that should be skipped may variate.

Thanks!

I already achieved this by using sed piped with awk:

sed '$d' input.txt | awk '{ for (i=1; i<=NF; i++) sum += $i } END { for (i=NF-5; i<=NF; i++) sum -= $i; print sum }'

However, a pure awk one-liner would be more preferred.

Upvotes: 1

Views: 129

Answers (5)

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2895

this awk approach makes no assumption(s) regarding

  • #. of ** ... ** sections, if any
  • whether lines end in \r\n or \n
  • how many columns per line

The full input in processed in one shot, and bypasses the need for a temp storage array.

Even numbered fields, which would be where the skipped sections reside, get blanked out, then the leftovers are re-split into usable fields.


mawk 'BEGIN { FS =  (_ = "[*]")_
              RS = (__ = "") "^$" } END { 

   for (++_; _++ < NF; _++) $_ = __

        _*= FS = "[ \11-\15]+"
       $_ = $_
        _ = ++NF
          
   while(--_)__ += $_
    
   printf("%.16g\n", __) }'

-2.307901

Upvotes: 0

jhnc
jhnc

Reputation: 16819

Stripping down @markp-fuso's idea:

awk -v RS=' ' '
    NF {
        ndx = cnt++ % lastN
        sum += circlist[ndx]
        circlist[ndx] = +$0
    }
    END { printf "%3.4f", sum }
' lastN=8 input.txt

The reason his array initialization and comparisons are not needed is that awk guarantees the values of uninitialized variables.

Splitting input on space (RS=' ') instead of newline and then checking the record has a field (the default behaviour of FS will split on the remaining whitespace), is more compact than his for loop to read each field, but requires that there is at least one actual space character between each number.

Your example lines begin with a leading space; if they did not, my code would fail silently by discarding the first element on each line (it would become $2 but +$0 is parsed as just the value of $1). If your awk supports using regex as RS (which a future standard may allow, and many popular versions already support), this problem can be fixed by using RS='[[:space:]]+'. (Or by using the original for loop to iterate over the fields.)

Upvotes: 1

ufopilot
ufopilot

Reputation: 3985

Using GNU AWK

$ awk -v RS='\\s' '!/^$/' file |
    awk -v n=8 '{sum[NR]=sum[NR-1]+$1} END{print sum[NR-n]}'
-2.3079

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 204456

Assuming what you're asking to do is be able to skip the last N numbers from the input then using any awk:

$ awk -v n=8 '
    { for (i=1; i<=NF; i++) vals[++c]=$i }
    END { for (i=1; i<=c-n; i++) sum+=vals[i]; printf "%3.4f", sum }
' file
-2.3079

or if you wanted to skip all values on the last line plus the 6 values at the end of the line before that:

$ awk -v n=6 '
    { for (i=1; i<=NF; i++) vals[++c]=$i }
    END { for (i=1; i<=c-(n+NF); i++) sum+=vals[i]; printf "%3.4f", sum }
' file
-2.3079

Upvotes: 2

markp-fuso
markp-fuso

Reputation: 35256

General approach:

  • to add all but the last N numbers ...
  • as we read a number we place it in a circular list (aka an array with an index based on count-of-numbers modulo N)
  • as we re-use a place in the circular list we add the previous number to the sum
  • when done we'll have a circular list that contains the last N numbers and a sum of all numbers up to, but not including, those numbers in the circular list

One awk idea:

awk -v lastN=HOW_MANY_TO_IGNORE '
BEGIN { for (i=0;i<lastN;i++) circlist[i]="X" }                    # initialize circular list
      { for (i=1;i<=NF;i++) {
            cnt++                                                  # increment count of numbers seen so far
            ndx=cnt%lastN                                          # calculate modulo index
            sum+=(circlist[ndx] != "X" ? circlist[ndx] : 0)        # add previous entry from circlist[] ?
            circlist[ndx]=$i                                       # add current value to circlist[]
        }
      }
END   { printf("%3.4f", sum) }
' input.txt

NOTES:

  • assumes all inputs are valid numbers otherwise OP can add logic to validate the inputs
  • assumes lastN is assigned a positive integer otherwise OP can add logic to validate the value of lastN

For OP's 2nd set of data we use -v lastN=8 which generates:

-2.3079

To verify the result we can make note of the fact that the 1st number to be ignored (1.165900) only occurs once in the data set so we can hardcode this into OP's current code:

$ awk '{for (i=1;i<=NF;i++) if ($i == 1.165900) exit; else sum += $i} END {printf("%3.4f", sum)}' input.txt
-2.3079

Upvotes: 2

Related Questions