Reputation: 187
Need your help, have one file where data is as shown below. Data for both the below scenarios is present in only 1(single) file and want the expected output in same file only if possible
Scenario 1:
If the value in the first column DocumentNo
is appearing once and
the second column Line
has the value 10, then I would like to sum columns 3,4,5 and 6 (Taxablevalue,IGSTAmount,CGSTAm
and SGSTAmo
) and place/replace this value which we have summed in the eight column Invoicevalue
:
example data:
DocumentNo|Line|Taxablevalue|IGSTAmount|CGSTAm|SGSTAmo|OthTa|InvoiceValue
262881894|10|10000|0|900|900||
Senario 2:
If we have multiple rows with identical values in the first column DocumentNo
and unique value in the second column LineN
, then I would like to sum all value of columns 3,4,5 and 6 (Taxablevalue,IGSTAmount,CGSTAm
and SGSTAmo
) and place/replace this value which we have summed in the eight column Invoicevalue
of each line.
example data:
DocumentNo|LineN|Taxablevalue|IGSTAmo|CGSTAmo|SGSTAmou|OthTa|InvoValue
262881894|10|10000|0|900|900||
262881894|20|15000|0|1350|1350||
262881894|30|20000|0|1800|1800||
Expected output Scenario 1:
DocumentNo|Line|Taxablevalue|IGSTAmount|CGSTAm|SGSTAmo|OthTa|InvoiceValue
262881894|10|10000|0|900|900||11800
Expected output Scenario 2:
Invoice Value = 10000+15000+20000+0+0+0+900+1350+1800+900+1350+1800 = 53100
DocumentNo|LineN|Taxablevalue|IGSTAmo|CGSTAmo|SGSTAmou|OthTa|InvoValue
262881894|10|10000|0|900|900||53100
262881894|20|15000|0|1350|1350||53100
262881894|30|20000|0|1800|1800||53100
Below is the code tried, but not able to figure out how to put added values in lastcolumn(InvoValue)
awk '{a[$1]+=$3;b[$1]+=$4;c[$1]+=$5;d[$1]+=$6;}
END {for(i in a) { print " " a[i] " " b[i] " " c[i] " " d[i];}}' File
Below is output of code that I'm getting. Sadly it is not matching my expected output :
0 0 0 0
Upvotes: 1
Views: 369
Reputation: 8304
I would do it in two passes.
On the first pass I would create a dictinary s
that would hold the sum of columns 3, 4, 5 and 6 for any specific document number.
On the second pass I would replace the value in InvoValue column.
Here's an example input data.txt
:
DocumentNo|LineN|Taxablevalue|IGSTAmo|CGSTAmo|SGSTAmou|OthTa|InvoValue
262881894|10|10000|0|900|900||
262881894|20|15000|0|1350|1350||
262881894|30|20000|0|1800|1800||
262881895|10|10000|0|900|900||
Here is the command:
gawk 'BEGIN { OFS=FS="|" } NR == FNR { s[$1] += $3+$4+$5+$6; next } FNR!=1 { $8 = s[$1] } 1;' data.txt data.txt
Here is the output:
DocumentNo|LineN|Taxablevalue|IGSTAmo|CGSTAmo|SGSTAmou|OthTa|InvoValue
262881894|10|10000|0|900|900||53100
262881894|20|15000|0|1350|1350||53100
262881894|30|20000|0|1800|1800||53100
262881895|10|10000|0|900|900||11800
Note that I ignored column 2 altogether. You might need to modify my answer if you want to account for the LineN.
To ensure that all pairs (DocumentNo, LineN) are unique and occur only once, you could add this error detection:
if (met[$1 FS $2]) print "ERROR: " $1 " " $2;
met[$1 FS $2] = 1;
So the updated command with error detection would be:
gawk 'BEGIN { OFS=FS="|" } NR == FNR { if (met[$1 FS $2]) print "ERROR: " $1 " " $2; met[$1 FS $2] = 1; s[$1] += $3+$4+$5+$6; next } FNR!=1 { $8 = s[$1] } 1;' data.txt data.txt
Upvotes: 1