Reputation: 625
Would like to read first field then generate sequence based on "&-" and "&&-" delimiter. Again read the first column then Fill downward Empty Column value with Previous Non-Empty Column value.
However the actual Input file has Not been separated by comma FS="," and tab FS="\t"
Ex: If Digits field is 210&-3 , need to populate 210 and 213 only. If Digits field is 210&&-3 , need to populate 210,211,212 and 213.
Input.txt
DIGITS AL DEST CHI CNT NEDEST CORG NCHA
20 0 ABC 1 N DEFABC 0 CHARGE
1 ABC 1 N GHIABC 0 CHARGE
2 ABC 1 N JKLABC 0 CHARGE
3 ABC 1 N MNOABC 0 CHARGE
4 ABC 1 N PQRABC 0 CHARGE
2130&&-4&-6&&-8 0 ABC 1 N DEFABC 0 CHARGE
1 ABC 1 N GHIABC 0 CHARGE
Hence , have followed below 2 steps to achieve desired output.
Step1: Read the first column then Fill downward Empty Column value with Previous Non-Empty Column value
awk 'a=/^ /{$0=(x)substr($0,length(x)+1)}!a{x=$1}1' Input.txt > Op_Step1.txt
Op_Step1.txt
20 0 ABC 1 N DEFABC 0 CHARGE
20 1 ABC 1 N GHIABC 0 CHARGE
20 2 ABC 1 N JKLABC 0 CHARGE
20 3 ABC 1 N MNOABC 0 CHARGE
20 4 ABC 1 N PQRABC 0 CHARGE
2130&&-4&-6&&-8 0 ABC 1 N DEFABC 0 CHARGE
2130&&-4&-6&&-8 1 ABC 1 N GHIABC 0 CHARGE
Step2: Read first field then generate sequence based on "&-" and "&&-" delimiter from the Op_Step1.txt
Thanks EdMorton for the below script:
$ awk -f tst.awk Op_Step1.txt
Since the above input has Not been separated by comma FS="," and tab FS="\t" , the below script is not working
BEGIN{ FS="\t" }
{
for (i=1;i<=NF;i++) {
if ($i == "") {
i++
$i = $1 - $i
for (j=(prev+1);j<$i;j++) {
print j
}
}
else if ($i < 0) {
$i = $1 - $i
}
print $i
prev = $i
}
}
Desired Output:
20 0 ABC 1 N DEFABC 0 CHARGE
20 1 ABC 1 N GHIABC 0 CHARGE
20 2 ABC 1 N JKLABC 0 CHARGE
20 3 ABC 1 N MNOABC 0 CHARGE
20 4 ABC 1 N PQRABC 0 CHARGE
2130 0 ABC 1 N DEFABC 0 CHARGE
2131 0 ABC 1 N DEFABC 0 CHARGE
2132 0 ABC 1 N DEFABC 0 CHARGE
2133 0 ABC 1 N DEFABC 0 CHARGE
2134 0 ABC 1 N DEFABC 0 CHARGE
2136 0 ABC 1 N DEFABC 0 CHARGE
2137 0 ABC 1 N DEFABC 0 CHARGE
2138 0 ABC 1 N DEFABC 0 CHARGE
2130 1 ABC 1 N GHIABC 0 CHARGE
2131 1 ABC 1 N GHIABC 0 CHARGE
2132 1 ABC 1 N GHIABC 0 CHARGE
2133 1 ABC 1 N GHIABC 0 CHARGE
2134 1 ABC 1 N GHIABC 0 CHARGE
2136 1 ABC 1 N GHIABC 0 CHARGE
2137 1 ABC 1 N GHIABC 0 CHARGE
2138 1 ABC 1 N GHIABC 0 CHARGE
Any suggestions , sorry for the lengthly post !!!
Update Comments:
1 NR==1 || !NF { next } # AVN: To skip header OR Blank Lines
2
3 /^[[:digit:]]/ { # AVN: To find field starts with [0-9]
4 blanks = range = $1 # AVN: Assign if the line begins with [0-9] and doesnt start with blank
# EM: saves the value of $1 in variable "ranges" and also saves it in variable "blanks"
5 gsub(/./," ",blanks) # AVN: To fill the empty field with previous assigned value
# EM: replaces every character in the variable "blanks" with a blank character.
6 $0 = blanks substr($0,length(blanks)+1) # AVN: Not able to understand
# EM: Replaces $1 with a string of the same length but all-blanks so that when we
# later need to change "2130&&-4&-6&&-8" to "2130", "2131", etc. we wont have
# to deal with the original string "2130&&-30&&-4&-6&&-8" still being present in $0.
# Remember we saved the original $1 value in the variable "range" so
# its OK to overwrite the characters in $0 now. We dont simply re-assign
# $1 as that would cause $0 to be recompiled using the current OFS value and
# so destroy all of your original spacing.
7 }
8
9 {
10 split(range,arr,/&/) # AVN: split & and store the values into arr variable
11 for (i=1;i in arr;i++) { # AVN: Looping elements based on arr count
12 if (arr[i] == "") { # AVN: Not able to catch the below Array Logics
# EM: split("2130&&-4&-6&&-8",arr,/&/) populates arr as
# arr[1]=2130, arr[2]="", arr[3]=-4, arr[4]=-6, arr[5]=""; arr[6]="-8"
# That should help you understand the loop logic - if in doubt add prints
# to dump array and other variable values then update your comments.
13 i++
14 for (j=(prev+1);j<(arr[1]-arr[i]);j++) {
15 print j substr($0,length(j)+1)
16 }
17 }
18
19 if (arr[i] < 0) {
20 arr[i] = arr[1] - arr[i]
21 }
22
23 print arr[i] substr($0,length(arr[i])+1)
24 prev = arr[i]
25 }
26 }
Upvotes: 0
Views: 126
Reputation: 203684
In the script you got from me, instead of setting FS to &
and looping on the fields, do split($1,arr,/&/)
and loop on the elements of arr
.
Since you put effort into doing it yourself and got close and the remaining details aren't completely obvious, here's the full script:
$ cat tst.awk
NR==1 || !NF { next }
/^[[:digit:]]/ {
blanks = range = $1
gsub(/./," ",blanks)
$0 = blanks substr($0,length(blanks)+1)
}
{
split(range,arr,/&/)
for (i=1;i in arr;i++) {
if (arr[i] == "") {
i++
for (j=(prev+1);j<(arr[1]-arr[i]);j++) {
print j substr($0,length(j)+1)
}
}
if (arr[i] < 0) {
arr[i] = arr[1] - arr[i]
}
print arr[i] substr($0,length(arr[i])+1)
prev = arr[i]
}
}
.
$ cat file
DIGITS AL DEST CHI CNT NEDEST CORG NCHA
20 0 ABC 1 N DEFABC 0 CHARGE
1 ABC 1 N GHIABC 0 CHARGE
2 ABC 1 N JKLABC 0 CHARGE
3 ABC 1 N MNOABC 0 CHARGE
4 ABC 1 N PQRABC 0 CHARGE
2130&&-4&-6&&-8 0 ABC 1 N DEFABC 0 CHARGE
1 ABC 1 N GHIABC 0 CHARGE
.
$ awk -f tst.awk file
20 0 ABC 1 N DEFABC 0 CHARGE
20 1 ABC 1 N GHIABC 0 CHARGE
20 2 ABC 1 N JKLABC 0 CHARGE
20 3 ABC 1 N MNOABC 0 CHARGE
20 4 ABC 1 N PQRABC 0 CHARGE
2130 0 ABC 1 N DEFABC 0 CHARGE
2131 0 ABC 1 N DEFABC 0 CHARGE
2132 0 ABC 1 N DEFABC 0 CHARGE
2133 0 ABC 1 N DEFABC 0 CHARGE
2134 0 ABC 1 N DEFABC 0 CHARGE
2136 0 ABC 1 N DEFABC 0 CHARGE
2137 0 ABC 1 N DEFABC 0 CHARGE
2138 0 ABC 1 N DEFABC 0 CHARGE
2130 1 ABC 1 N GHIABC 0 CHARGE
2131 1 ABC 1 N GHIABC 0 CHARGE
2132 1 ABC 1 N GHIABC 0 CHARGE
2133 1 ABC 1 N GHIABC 0 CHARGE
2134 1 ABC 1 N GHIABC 0 CHARGE
2136 1 ABC 1 N GHIABC 0 CHARGE
2137 1 ABC 1 N GHIABC 0 CHARGE
2138 1 ABC 1 N GHIABC 0 CHARGE
Upvotes: 4