Reputation:
I have an issue about the computation in Bash shell of the total sum of integers that come from a file. The integers that I want to sum are located on the last field with awk ($NF) of each row.
Technically, here is my Bash commands :
sum=0
for i in $(grep 'number_vars' file.txt | sed 's/_//g;s/,/./g;s/\.00//g' | awk '{print $NF}'); do sum=$((sum+i)); done
Unfortunately, this last command produces an error : syntax error: invalid arithmetic operator
and I don't understand why.
So, I have continued to try fixing this error by storing all the integers into a file temp.dat
as :
for i in $(grep 'numbers_vars' file.txt | sed 's/_//g;s/,/./g;s/\.00//g' | awk '{print $NF}'); do echo $i >> temp.dat; done
I get the following content for temp.dat
:
$ cat temp.dat
500
110
300
110
110
110
1500
110
1500
110
110
110
3000
110
110
110
3000
3000
110
110
Then, if I do :
sum=0
for i in $(cat temp.dat); do sum=$((sum+i)); done
always the same error : syntax error: invalid arithmetic operator
I have also tried : sum=$(($sum+$i))
but without success.
I don't know what to do now.
UPDATE 1 : WILSON corresponds to last name :
$ LC_ALL=C cat -vt file_in.txt | grep WILSON
WILSON PETER "^I^I500,00^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
WILSON PETER "^I^I300,00^M
MR WILSON CHRISTOPHE VIREMENT S ^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
MR WILSON CHRISTOPHE VIREMENT S ^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
WILSON PETER "^I^I1_500,00^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
WILSON PETER "^I^I1_500,00^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
WILSON PETER "^I^I3_000,00^M
...
UPDATE 2 : SOLUTION I found out how to get a good formatted text file (.txt), i.e a file where each row correspond to a row of the initial Excel table. I must conclude that exporting tool of Excel 2019 on my MacOS Catalina does really bullshit : I realized by editing the text file generated, everyting is mixed, i.e a line may be a part of the entire row of Excel table etc... Finally, I couldn't do anything with this bad formatting.
SOLUTION : if it can help someone who wants to process quickly an Excel table with bash functions, below what I did.
1) First export the Excel table to a PDF file
2) Open PDF file with Acrobat and export it to "raw text"
3) Then, I can check with `vim` is well formatted, i.e "a line in Excel table" equal to "one line in text file
4) After this checking, you can apply the different solutions suggested by all the people who helped me to compute the sum of the desired column.
For example, by following the procedure above from 1) to 4), I have just to do :
awk '{sum+=$2}END{print sum}' file.txt
Finally, don't forget this is not an issue about my terminal commands, which are correct, but rather about the bad text file generating by Excel.
So I found this way to circumvent the issue by using an intermediary tool (Acrobat). I agree this is a little tricky but I didn't manage to do better with the only text exporting tool of Excel.
Upvotes: 1
Views: 502
Reputation: 241671
Your basic problem is that your file has Windows line endings (CR-LF) and the CR (aka \r
aka control-M) looks like an ordinary character to bash. You can get rid of them the same way you get rid of the underscores, using sed
. Or you can use the dos2unix
tool or filter the file through tr -d '\r'
. Also, many text editors will do this for you.
Using a bunch of tools for a simple modification like this is not the most efficient, since you can do the whole transformation and the sum in awk.
Upvotes: 0
Reputation: 10982
from your temp.dat
file you can try:
awk '{s+=$1} END {print s}' temp.dat
Upvotes: 1
Reputation: 140930
Try:
grep WILSON file_in.txt | sed 's/.*\x09//;s/\r//' | awk '{s+=$0}END{print s}'
s/.*\x09//;
- remove everything up to and including a \x09
character.s/\r//
- remove the dos line ending from the end of the filewould be faster to filter in awk:
sed 's/.*\x09//;s/\r//' file_in.txt | awk '/WILSON/{s+=$0}END{print s}'
Upvotes: 0
Reputation: 16118
I don't have visibility into the original input data, but you should be able to do this all in awk:
sum=$(awk '
{
gsub(/,+/, ".", $NF);
gsub(/[^0-9.]+/, "", $NF);
sum += $NF;
} END {
print sum;
}
' file.txt)
(You can collapse white space and unwrap that onto a single line if you want.)
This assembles your logic entirely in awk, including the addition, printing only the final sum.
Note, awk can handle decimals, so I've left them intact. POSIX shell and bash can not handle decimals. Your s/\.00//g
is odd to me since it'll turn 5.004
into 54
but perhaps you never have precision beyond the hundredths? And that precision is always .00
?
AWK logic explained line by line:
sum
variable (which is auto-initialized as zero)If you need that final value to be an integer, say because you'll do further bash arithmetic or numeric comparisons, replace that print
line with either printf "%d", sum
(truncate) or printf "%.0f", sum
(round).
If this doesn't solve your issue, I'd like to see a hex dump of your input using hd file.txt
Upvotes: 0