Awk script to provide sum of certain columns in another column basis on criteria

Question

Need your help, have one file where data is as shown below. Data for both the below scenarios is present in only 1(single) file and want the expected output in same file only if possible

Scenario 1:

If the value in the first column DocumentNo is appearing once and the second column Line has the value 10, then I would like to sum columns 3,4,5 and 6 (Taxablevalue,IGSTAmount,CGSTAm and SGSTAmo) and place/replace this value which we have summed in the eight column Invoicevalue:

example data:

DocumentNo|Line|Taxablevalue|IGSTAmount|CGSTAm|SGSTAmo|OthTa|InvoiceValue
262881894|10|10000|0|900|900||

Senario 2:

If we have multiple rows with identical values in the first column DocumentNo and unique value in the second column LineN, then I would like to sum all value of columns 3,4,5 and 6 (Taxablevalue,IGSTAmount,CGSTAm and SGSTAmo) and place/replace this value which we have summed in the eight column Invoicevalue of each line.

example data:

DocumentNo|LineN|Taxablevalue|IGSTAmo|CGSTAmo|SGSTAmou|OthTa|InvoValue
262881894|10|10000|0|900|900||              
262881894|20|15000|0|1350|1350||            
262881894|30|20000|0|1800|1800||

Expected output Scenario 1:

DocumentNo|Line|Taxablevalue|IGSTAmount|CGSTAm|SGSTAmo|OthTa|InvoiceValue
 262881894|10|10000|0|900|900||11800

Expected output Scenario 2:

Invoice Value = 10000+15000+20000+0+0+0+900+1350+1800+900+1350+1800 = 53100

DocumentNo|LineN|Taxablevalue|IGSTAmo|CGSTAmo|SGSTAmou|OthTa|InvoValue
  262881894|10|10000|0|900|900||53100  
  262881894|20|15000|0|1350|1350||53100 
  262881894|30|20000|0|1800|1800||53100

Below is the code tried, but not able to figure out how to put added values in lastcolumn(InvoValue)

awk '{a[$1]+=$3;b[$1]+=$4;c[$1]+=$5;d[$1]+=$6;}
      END {for(i in a) { print " " a[i] " " b[i] " " c[i] " " d[i];}}' File

Below is output of code that I'm getting. Sadly it is not matching my expected output :

0 0 0 0

Andriy Makukha · Accepted Answer

I would do it in two passes.

On the first pass I would create a dictinary s that would hold the sum of columns 3, 4, 5 and 6 for any specific document number.

On the second pass I would replace the value in InvoValue column.

Here's an example input data.txt:

DocumentNo|LineN|Taxablevalue|IGSTAmo|CGSTAmo|SGSTAmou|OthTa|InvoValue
262881894|10|10000|0|900|900||
262881894|20|15000|0|1350|1350||
262881894|30|20000|0|1800|1800||
262881895|10|10000|0|900|900||

Here is the command:

gawk 'BEGIN { OFS=FS="|" } NR == FNR { s[$1] += $3+$4+$5+$6; next } FNR!=1 { $8 = s[$1] } 1;' data.txt data.txt

Here is the output:

DocumentNo|LineN|Taxablevalue|IGSTAmo|CGSTAmo|SGSTAmou|OthTa|InvoValue
262881894|10|10000|0|900|900||53100
262881894|20|15000|0|1350|1350||53100
262881894|30|20000|0|1800|1800||53100
262881895|10|10000|0|900|900||11800

Note that I ignored column 2 altogether. You might need to modify my answer if you want to account for the LineN.

To ensure that all pairs (DocumentNo, LineN) are unique and occur only once, you could add this error detection:

if (met[$1 FS $2]) print "ERROR: " $1 " " $2; 
met[$1 FS $2] = 1;

So the updated command with error detection would be:

gawk 'BEGIN { OFS=FS="|" } NR == FNR { if (met[$1 FS $2]) print "ERROR: " $1 " " $2; met[$1 FS $2] = 1; s[$1] += $3+$4+$5+$6; next } FNR!=1 { $8 = s[$1] } 1;' data.txt data.txt

Awk script to provide sum of certain columns in another column basis on criteria

Answers (1)

Related Questions