Use Awk to sum a field using regex and loop

Question

New to Awk and scripting/programming and asking questions. Hoping to sum a field from a monthly, system generated text file (on Windows10 using WSL1) using Awk. The targeted lines look like this:

   Client No.   Client Name                     O/S Balance        Ledger1      Ledger2      Ledger3        Ledger4      Ledger5      Ledger6      Ledger7 Comments
  C00716427.1  Queensview Ohio LLC.              888,924.35           0.00         0.00         0.00     888,924.35     1,803.21         0.00     2,499.96-admin fee
  C00716576.3  0140-8487 Quebec Inc            6,260,987.91           0.00         0.00         0.00   6,260,987.91    18,418.34         0.00        20.99-May 01/20 w/c
  C00716868.1  0328-2400 Quebec Inc.           1,183,948.05           0.00         0.00         0.00   1,183,948.05     2,155.69     5,211.35-1,198,846.32-April 29/20 p/o
  C00617231.3  Ricky Baker Enterprise         49,593,446.91           0.00         0.00         0.00  49,593,446.91    83,220.21   442,202.51-      105.14-May 01/20 w/c
  C00617316.3  101287388 Sarasota LLC.         3,089,599.64           0.00         0.00         0.00   3,089,599.64     2,604.15    27,134.56-        6.08-May 01/20 w/c
  C00617447.2  AmeriUnion Trade 123            6,088,229.78           0.00         0.00         0.00   6,088,229.78    24,921.06         0.00       325.26-surplus funds

I would like the output to be only the values of Client No. and Ledger7 (no need for the headers). By piecing together some other solutions, this is what I've mashed together:

awk '/C00/ {for(i=1;i<=NF;i++){gsub ( ",","" ); if($i ~ /[0-9]\-[a-zA-Z]/){print substr($1,4,8) " " $i} } }' April.txt

Which produces:

716427.1 2499.96-admin
716576.3 20.99-May
716868.1 5211.35-1198846.32-April
617231.3 105.14-May
617316.3 6.08-May
617447.2 325.26-surplus

FYI - there is always a "-" separating Ledger7 and the comments. The issue for me occurs when Ledger6 is a negative value and Ledger7 is >= $1 million.

Desired output to achieve the sum of 1201803.75:

716427.1 2499.96
716576.3 20.99
716868.1 1198846.32
617231.3 105.14
617316.3 6.08
617447.2 325.26

Any ideas on how to adapt or restructure my script? Let me know if further details are needed. Thanks in advance.

anubhava · Accepted Answer

Using gnu awk, you may can do this:

awk 'NR>1 {
   amt = gensub(/^.*[-[:blank:]]([0-9][0-9,.]*)-[^-]+$/, "\1", 1)
   gsub(/,/, "", amt)
   print $1, amt
}' file

C00716427.1 2499.96
C00716576.3 20.99
C00716868.1 1198846.32
C00617231.3 105.14
C00617316.3 6.08
C00617447.2 325.26

How it works:

NR > 1 ignore header row
gensub function matches and captures last amount file that comes after a space or hyphen and is followed by a hyphen.
gsub function removes all commas from amount field
print simply prints 1st column with amount

Use Awk to sum a field using regex and loop

Answers (2)

Related Questions