larry_opoly
larry_opoly

Reputation: 35

Use Awk to sum a field using regex and loop

New to Awk and scripting/programming and asking questions. Hoping to sum a field from a monthly, system generated text file (on Windows10 using WSL1) using Awk. The targeted lines look like this:

   Client No.   Client Name                     O/S Balance        Ledger1      Ledger2      Ledger3        Ledger4      Ledger5      Ledger6      Ledger7 Comments
  C00716427.1  Queensview Ohio LLC.              888,924.35           0.00         0.00         0.00     888,924.35     1,803.21         0.00     2,499.96-admin fee
  C00716576.3  0140-8487 Quebec Inc            6,260,987.91           0.00         0.00         0.00   6,260,987.91    18,418.34         0.00        20.99-May 01/20 w/c
  C00716868.1  0328-2400 Quebec Inc.           1,183,948.05           0.00         0.00         0.00   1,183,948.05     2,155.69     5,211.35-1,198,846.32-April 29/20 p/o
  C00617231.3  Ricky Baker Enterprise         49,593,446.91           0.00         0.00         0.00  49,593,446.91    83,220.21   442,202.51-      105.14-May 01/20 w/c
  C00617316.3  101287388 Sarasota LLC.         3,089,599.64           0.00         0.00         0.00   3,089,599.64     2,604.15    27,134.56-        6.08-May 01/20 w/c
  C00617447.2  AmeriUnion Trade 123            6,088,229.78           0.00         0.00         0.00   6,088,229.78    24,921.06         0.00       325.26-surplus funds

I would like the output to be only the values of Client No. and Ledger7 (no need for the headers). By piecing together some other solutions, this is what I've mashed together:

awk '/C00/ {for(i=1;i<=NF;i++){gsub ( ",","" ); if($i ~ /[0-9]\-[a-zA-Z]/){print substr($1,4,8) " " $i} } }' April.txt

Which produces:

716427.1 2499.96-admin
716576.3 20.99-May
716868.1 5211.35-1198846.32-April
617231.3 105.14-May
617316.3 6.08-May
617447.2 325.26-surplus

FYI - there is always a "-" separating Ledger7 and the comments. The issue for me occurs when Ledger6 is a negative value and Ledger7 is >= $1 million.

Desired output to achieve the sum of 1201803.75:

716427.1 2499.96
716576.3 20.99
716868.1 1198846.32
617231.3 105.14
617316.3 6.08
617447.2 325.26

Any ideas on how to adapt or restructure my script? Let me know if further details are needed. Thanks in advance.

Upvotes: 3

Views: 275

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133600

Could you please try following, written and tested with shown samples. Written and tested in link https://ideone.com/3oYBGq

awk '
match($0,/[-[:blank:]]([0-9][0-9,.]*)-[^-]+$/){
  val=substr($0,RSTART,RLENGTH)
  sub(/^ +/,"",val)
  sub(/ +.*$/,"",val)
  num=split(val,arr,"-")
  print $1,arr[num-1]
  val=""
}' Input_file

Explanation: Adding detailed explanation for above.

awk '                                                ##Starting awk program from here.
match($0,/[-[:blank:]]([0-9][0-9,.]*)-[^-]+$/){      ##Using match function to match regex [-[:blank:]]([0-9][0-9,.]*)-[^-]+ till end of line.
  val=substr($0,RSTART,RLENGTH)                      ##Creating val which has substring of matched regex value in current line as shown above.
  sub(/^ +/,"",val)                                  ##Substituting starting space with NULL in val here.
  sub(/ +.*$/,"",val)                                ##Substituting space till everything in last of line in val here.
  num=split(val,arr,"-")                             ##Splitting val value into array arr with delimoter - here.
  print $1,arr[num-1]                                ##Printing 1st field and 2nd last value of arr here.
  val=""                                             ##Nullify val here.
}' Input_file                                        ##Mentioning Input_file name here.

Upvotes: 3

anubhava
anubhava

Reputation: 785406

Using gnu awk, you may can do this:

awk 'NR>1 {
   amt = gensub(/^.*[-[:blank:]]([0-9][0-9,.]*)-[^-]+$/, "\\1", 1)
   gsub(/,/, "", amt)
   print $1, amt
}' file
C00716427.1 2499.96
C00716576.3 20.99
C00716868.1 1198846.32
C00617231.3 105.14
C00617316.3 6.08
C00617447.2 325.26

How it works:

  • NR > 1 ignore header row
  • gensub function matches and captures last amount file that comes after a space or hyphen and is followed by a hyphen.
  • gsub function removes all commas from amount field
  • print simply prints 1st column with amount

Upvotes: 3

Related Questions