VNA
VNA

Reputation: 625

awk to generate consecutive sequence - Continuation:

Would like to read first field then generate sequence based on "&-" and "&&-" delimiter. Again read the first column then Fill downward Empty Column value with Previous Non-Empty Column value.

However the actual Input file has Not been separated by comma FS="," and tab FS="\t"

Ex: If Digits field is 210&-3 , need to populate 210 and 213 only. If Digits field is 210&&-3 , need to populate 210,211,212 and 213.

Input.txt

DIGITS                   AL DEST         CHI CNT NEDEST       CORG  NCHA



  20                        0 ABC          1   N   DEFABC       0     CHARGE      
                            1 ABC          1   N   GHIABC       0     CHARGE      
                            2 ABC          1   N   JKLABC       0     CHARGE      
                            3 ABC          1   N   MNOABC       0     CHARGE      
                            4 ABC          1   N   PQRABC       0     CHARGE    
  2130&&-4&-6&&-8           0 ABC          1   N   DEFABC       0     CHARGE      
                            1 ABC          1   N   GHIABC       0     CHARGE      

Hence , have followed below 2 steps to achieve desired output.

Step1: Read the first column then Fill downward Empty Column value with Previous Non-Empty Column value

awk 'a=/^ /{$0=(x)substr($0,length(x)+1)}!a{x=$1}1' Input.txt > Op_Step1.txt

Op_Step1.txt

  20                        0 ABC          1   N   DEFABC       0     CHARGE
  20                        1 ABC          1   N   GHIABC       0     CHARGE
  20                        2 ABC          1   N   JKLABC       0     CHARGE
  20                        3 ABC          1   N   MNOABC       0     CHARGE
  20                        4 ABC          1   N   PQRABC       0     CHARGE
  2130&&-4&-6&&-8           0 ABC          1   N   DEFABC       0     CHARGE
  2130&&-4&-6&&-8           1 ABC          1   N   GHIABC       0     CHARGE

Step2: Read first field then generate sequence based on "&-" and "&&-" delimiter from the Op_Step1.txt

Thanks EdMorton for the below script:

$ awk -f tst.awk Op_Step1.txt

Since the above input has Not been separated by comma FS="," and tab FS="\t" , the below script is not working

BEGIN{ FS="\t" }
  {
      for (i=1;i<=NF;i++) {
          if ($i == "") {
              i++
              $i = $1 - $i
              for (j=(prev+1);j<$i;j++) {
                  print j
              }
          }
          else if ($i < 0) {
              $i = $1 - $i
          }

          print $i
          prev = $i
      }
}

Desired Output:

   20                        0 ABC          1   N   DEFABC       0     CHARGE
   20                        1 ABC          1   N   GHIABC       0     CHARGE
   20                        2 ABC          1   N   JKLABC       0     CHARGE
   20                        3 ABC          1   N   MNOABC       0     CHARGE
   20                        4 ABC          1   N   PQRABC       0     CHARGE
   2130        0 ABC          1   N   DEFABC       0     CHARGE
   2131                      0 ABC          1   N   DEFABC       0     CHARGE
   2132                      0 ABC          1   N   DEFABC       0     CHARGE
   2133                      0 ABC          1   N   DEFABC       0     CHARGE
   2134                      0 ABC          1   N   DEFABC       0     CHARGE
   2136                      0 ABC          1   N   DEFABC       0     CHARGE
   2137                      0 ABC          1   N   DEFABC       0     CHARGE
   2138                      0 ABC          1   N   DEFABC       0     CHARGE
   2130        1 ABC          1   N   GHIABC       0     CHARGE
   2131                      1 ABC          1   N   GHIABC       0     CHARGE
   2132                      1 ABC          1   N   GHIABC       0     CHARGE
   2133                      1 ABC          1   N   GHIABC       0     CHARGE
   2134                      1 ABC          1   N   GHIABC       0     CHARGE
   2136                      1 ABC          1   N   GHIABC       0     CHARGE
   2137                      1 ABC          1   N   GHIABC       0     CHARGE
   2138                      1 ABC          1   N   GHIABC       0     CHARGE

Any suggestions , sorry for the lengthly post !!!

Update Comments:

 1  NR==1 || !NF { next }                # AVN: To skip header OR Blank Lines
 2
 3  /^[[:digit:]]/ {                     # AVN: To find field starts with [0-9]
 4      blanks = range = $1              # AVN: Assign if the line begins with [0-9] and doesnt start with blank 
                                         # EM: saves the value of $1 in variable "ranges" and also saves it in variable "blanks"
 5      gsub(/./," ",blanks)             # AVN: To fill the empty field with previous assigned value
                                         # EM: replaces every character in the variable "blanks" with a blank character.
 6      $0 = blanks substr($0,length(blanks)+1) # AVN: Not able to understand
                                         # EM: Replaces $1 with a string of the same length but all-blanks so that when we
                                         # later need to change "2130&&-4&-6&&-8" to "2130", "2131", etc. we wont have
                                         # to deal with the original string "2130&&-30&&-4&-6&&-8" still being present in $0.
                                         # Remember we saved the original $1 value in the variable "range" so
                                         # its OK to overwrite the characters in $0 now. We dont simply re-assign
                                         # $1 as that would cause $0 to be recompiled using the current OFS value and
                                         # so destroy all of your original spacing.
 7  }
 8
 9  {
10      split(range,arr,/&/)             # AVN: split & and store the values into arr variable 
11      for (i=1;i in arr;i++) {         # AVN: Looping elements based on arr count
12          if (arr[i] == "") {          # AVN: Not able to catch the below Array Logics
                                         # EM: split("2130&&-4&-6&&-8",arr,/&/) populates arr as
                                         # arr[1]=2130, arr[2]="", arr[3]=-4, arr[4]=-6, arr[5]=""; arr[6]="-8"
                                         # That should help you understand the loop logic - if in doubt add prints
                                         # to dump array and other variable values then update your comments.
13              i++
14              for (j=(prev+1);j<(arr[1]-arr[i]);j++) {
15                  print j substr($0,length(j)+1)
16              }
17          }
18
19          if (arr[i] < 0) {
20              arr[i] = arr[1] - arr[i]
21          }
22
23          print arr[i] substr($0,length(arr[i])+1)
24          prev = arr[i]
25      }
26  }

Upvotes: 0

Views: 126

Answers (1)

Ed Morton
Ed Morton

Reputation: 203684

In the script you got from me, instead of setting FS to & and looping on the fields, do split($1,arr,/&/) and loop on the elements of arr.

Since you put effort into doing it yourself and got close and the remaining details aren't completely obvious, here's the full script:

$ cat tst.awk
NR==1 || !NF { next }

/^[[:digit:]]/ {
    blanks = range = $1
    gsub(/./," ",blanks)
    $0 = blanks substr($0,length(blanks)+1)

}

{
    split(range,arr,/&/)
    for (i=1;i in arr;i++) {
        if (arr[i] == "") {
            i++
            for (j=(prev+1);j<(arr[1]-arr[i]);j++) {
                print j substr($0,length(j)+1)
            }
        }

        if (arr[i] < 0) {
            arr[i] = arr[1] - arr[i]
        }

        print arr[i] substr($0,length(arr[i])+1)
        prev = arr[i]
    }
}

.

$ cat file
DIGITS                   AL DEST         CHI CNT NEDEST       CORG  NCHA



20                        0 ABC          1   N   DEFABC       0     CHARGE
                          1 ABC          1   N   GHIABC       0     CHARGE
                          2 ABC          1   N   JKLABC       0     CHARGE
                          3 ABC          1   N   MNOABC       0     CHARGE
                          4 ABC          1   N   PQRABC       0     CHARGE
2130&&-4&-6&&-8           0 ABC          1   N   DEFABC       0     CHARGE
                          1 ABC          1   N   GHIABC       0     CHARGE

.

$ awk -f tst.awk file
20                        0 ABC          1   N   DEFABC       0     CHARGE
20                        1 ABC          1   N   GHIABC       0     CHARGE
20                        2 ABC          1   N   JKLABC       0     CHARGE
20                        3 ABC          1   N   MNOABC       0     CHARGE
20                        4 ABC          1   N   PQRABC       0     CHARGE
2130                      0 ABC          1   N   DEFABC       0     CHARGE
2131                      0 ABC          1   N   DEFABC       0     CHARGE
2132                      0 ABC          1   N   DEFABC       0     CHARGE
2133                      0 ABC          1   N   DEFABC       0     CHARGE
2134                      0 ABC          1   N   DEFABC       0     CHARGE
2136                      0 ABC          1   N   DEFABC       0     CHARGE
2137                      0 ABC          1   N   DEFABC       0     CHARGE
2138                      0 ABC          1   N   DEFABC       0     CHARGE
2130                      1 ABC          1   N   GHIABC       0     CHARGE
2131                      1 ABC          1   N   GHIABC       0     CHARGE
2132                      1 ABC          1   N   GHIABC       0     CHARGE
2133                      1 ABC          1   N   GHIABC       0     CHARGE
2134                      1 ABC          1   N   GHIABC       0     CHARGE
2136                      1 ABC          1   N   GHIABC       0     CHARGE
2137                      1 ABC          1   N   GHIABC       0     CHARGE
2138                      1 ABC          1   N   GHIABC       0     CHARGE

Upvotes: 4

Related Questions