user1773603
user1773603

Reputation:

How to compute the sum of a list of integer in Bash

I have an issue about the computation in Bash shell of the total sum of integers that come from a file. The integers that I want to sum are located on the last field with awk ($NF) of each row.

Technically, here is my Bash commands :

sum=0
for i in $(grep 'number_vars' file.txt | sed 's/_//g;s/,/./g;s/\.00//g' | awk '{print $NF}'); do sum=$((sum+i)); done

Unfortunately, this last command produces an error : syntax error: invalid arithmetic operatorand I don't understand why.

So, I have continued to try fixing this error by storing all the integers into a file temp.dat as :

for i in $(grep 'numbers_vars' file.txt | sed 's/_//g;s/,/./g;s/\.00//g' | awk '{print $NF}'); do echo $i >> temp.dat; done

I get the following content for temp.dat :

$ cat temp.dat 
    500
    110
    300
    110
    110
    110
    1500
    110
    1500
    110
    110
    110
    3000
    110
    110
    110
    3000
    3000
    110
    110

Then, if I do :

sum=0
for i in $(cat temp.dat); do sum=$((sum+i)); done

always the same error : syntax error: invalid arithmetic operator

I have also tried : sum=$(($sum+$i)) but without success.

I don't know what to do now.

UPDATE 1 : WILSON corresponds to last name :

$ LC_ALL=C cat -vt file_in.txt | grep WILSON 

WILSON PETER "^I^I500,00^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
WILSON PETER "^I^I300,00^M
MR WILSON CHRISTOPHE VIREMENT S ^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
MR WILSON CHRISTOPHE VIREMENT S ^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
WILSON PETER "^I^I1_500,00^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
WILSON PETER "^I^I1_500,00^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
WILSON PETER MR WILSON J MR WILSON PETER "^I^I110,00^M
WILSON PETER "^I^I3_000,00^M
...

UPDATE 2 : SOLUTION I found out how to get a good formatted text file (.txt), i.e a file where each row correspond to a row of the initial Excel table. I must conclude that exporting tool of Excel 2019 on my MacOS Catalina does really bullshit : I realized by editing the text file generated, everyting is mixed, i.e a line may be a part of the entire row of Excel table etc... Finally, I couldn't do anything with this bad formatting.

SOLUTION : if it can help someone who wants to process quickly an Excel table with bash functions, below what I did.

1) First export the Excel table to a PDF file
2) Open PDF file with Acrobat and export it to "raw text"
3) Then, I can check with `vim` is well formatted, i.e "a line in Excel table" equal to "one line in text file
4) After this checking, you can apply the different solutions suggested by all the people who helped me to compute the sum of the desired column.
For example, by following the procedure above from 1) to 4), I have just to do : 

awk '{sum+=$2}END{print sum}' file.txt

Finally, don't forget this is not an issue about my terminal commands, which are correct, but rather about the bad text file generating by Excel.

So I found this way to circumvent the issue by using an intermediary tool (Acrobat). I agree this is a little tricky but I didn't manage to do better with the only text exporting tool of Excel.

Upvotes: 1

Views: 502

Answers (4)

rici
rici

Reputation: 241671

Your basic problem is that your file has Windows line endings (CR-LF) and the CR (aka \r aka control-M) looks like an ordinary character to bash. You can get rid of them the same way you get rid of the underscores, using sed. Or you can use the dos2unix tool or filter the file through tr -d '\r'. Also, many text editors will do this for you.

Using a bunch of tools for a simple modification like this is not the most efficient, since you can do the whole transformation and the sum in awk.

Upvotes: 0

Picaud Vincent
Picaud Vincent

Reputation: 10982

from your temp.dat file you can try:

awk '{s+=$1} END {print s}' temp.dat

Upvotes: 1

KamilCuk
KamilCuk

Reputation: 140930

Try:

grep WILSON file_in.txt | sed 's/.*\x09//;s/\r//' | awk '{s+=$0}END{print s}'
  • s/.*\x09//; - remove everything up to and including a \x09 character.
  • s/\r// - remove the dos line ending from the end of the file

would be faster to filter in awk:

sed 's/.*\x09//;s/\r//' file_in.txt | awk '/WILSON/{s+=$0}END{print s}'

Upvotes: 0

Adam Katz
Adam Katz

Reputation: 16118

I don't have visibility into the original input data, but you should be able to do this all in awk:

sum=$(awk '
        {
          gsub(/,+/, ".", $NF);
          gsub(/[^0-9.]+/, "", $NF);
          sum += $NF;
        } END {
          print sum;
        }
     ' file.txt)

(You can collapse white space and unwrap that onto a single line if you want.)

This assembles your logic entirely in awk, including the addition, printing only the final sum.

Note, awk can handle decimals, so I've left them intact. POSIX shell and bash can not handle decimals. Your s/\.00//g is odd to me since it'll turn 5.004 into 54 but perhaps you never have precision beyond the hundredths? And that precision is always .00?

AWK logic explained line by line:

  • No condition (always run for every line):
    • Replace all consecutive commas with a dot (last field only)
    • Remove all characters that are neither a digit nor a dot (last field only)
      → This will solve any issues you have with invisible characters
    • Add the value to a sum variable (which is auto-initialized as zero)
  • After all lines are read:
    • Print that sum

If you need that final value to be an integer, say because you'll do further bash arithmetic or numeric comparisons, replace that print line with either printf "%d", sum (truncate) or printf "%.0f", sum (round).

If this doesn't solve your issue, I'd like to see a hex dump of your input using hd file.txt

Upvotes: 0

Related Questions