Reputation: 35
New to Awk and scripting/programming and asking questions. Hoping to sum a field from a monthly, system generated text file (on Windows10 using WSL1) using Awk. The targeted lines look like this:
Client No. Client Name O/S Balance Ledger1 Ledger2 Ledger3 Ledger4 Ledger5 Ledger6 Ledger7 Comments
C00716427.1 Queensview Ohio LLC. 888,924.35 0.00 0.00 0.00 888,924.35 1,803.21 0.00 2,499.96-admin fee
C00716576.3 0140-8487 Quebec Inc 6,260,987.91 0.00 0.00 0.00 6,260,987.91 18,418.34 0.00 20.99-May 01/20 w/c
C00716868.1 0328-2400 Quebec Inc. 1,183,948.05 0.00 0.00 0.00 1,183,948.05 2,155.69 5,211.35-1,198,846.32-April 29/20 p/o
C00617231.3 Ricky Baker Enterprise 49,593,446.91 0.00 0.00 0.00 49,593,446.91 83,220.21 442,202.51- 105.14-May 01/20 w/c
C00617316.3 101287388 Sarasota LLC. 3,089,599.64 0.00 0.00 0.00 3,089,599.64 2,604.15 27,134.56- 6.08-May 01/20 w/c
C00617447.2 AmeriUnion Trade 123 6,088,229.78 0.00 0.00 0.00 6,088,229.78 24,921.06 0.00 325.26-surplus funds
I would like the output to be only the values of Client No. and Ledger7 (no need for the headers). By piecing together some other solutions, this is what I've mashed together:
awk '/C00/ {for(i=1;i<=NF;i++){gsub ( ",","" ); if($i ~ /[0-9]\-[a-zA-Z]/){print substr($1,4,8) " " $i} } }' April.txt
Which produces:
716427.1 2499.96-admin
716576.3 20.99-May
716868.1 5211.35-1198846.32-April
617231.3 105.14-May
617316.3 6.08-May
617447.2 325.26-surplus
FYI - there is always a "-" separating Ledger7 and the comments. The issue for me occurs when Ledger6 is a negative value and Ledger7 is >= $1 million.
Desired output to achieve the sum of 1201803.75
:
716427.1 2499.96
716576.3 20.99
716868.1 1198846.32
617231.3 105.14
617316.3 6.08
617447.2 325.26
Any ideas on how to adapt or restructure my script? Let me know if further details are needed. Thanks in advance.
Upvotes: 3
Views: 275
Reputation: 133600
Could you please try following, written and tested with shown samples. Written and tested in link https://ideone.com/3oYBGq
awk '
match($0,/[-[:blank:]]([0-9][0-9,.]*)-[^-]+$/){
val=substr($0,RSTART,RLENGTH)
sub(/^ +/,"",val)
sub(/ +.*$/,"",val)
num=split(val,arr,"-")
print $1,arr[num-1]
val=""
}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/[-[:blank:]]([0-9][0-9,.]*)-[^-]+$/){ ##Using match function to match regex [-[:blank:]]([0-9][0-9,.]*)-[^-]+ till end of line.
val=substr($0,RSTART,RLENGTH) ##Creating val which has substring of matched regex value in current line as shown above.
sub(/^ +/,"",val) ##Substituting starting space with NULL in val here.
sub(/ +.*$/,"",val) ##Substituting space till everything in last of line in val here.
num=split(val,arr,"-") ##Splitting val value into array arr with delimoter - here.
print $1,arr[num-1] ##Printing 1st field and 2nd last value of arr here.
val="" ##Nullify val here.
}' Input_file ##Mentioning Input_file name here.
Upvotes: 3
Reputation: 785406
Using gnu awk
, you may can do this:
awk 'NR>1 {
amt = gensub(/^.*[-[:blank:]]([0-9][0-9,.]*)-[^-]+$/, "\\1", 1)
gsub(/,/, "", amt)
print $1, amt
}' file
C00716427.1 2499.96
C00716576.3 20.99
C00716868.1 1198846.32
C00617231.3 105.14
C00617316.3 6.08
C00617447.2 325.26
How it works:
NR > 1
ignore header rowgensub
function matches and captures last amount file that comes after a space or hyphen and is followed by a hyphen.gsub
function removes all commas from amount fieldprint
simply prints 1st column with amountUpvotes: 3