enghyd1
enghyd1

Reputation: 40

script for sum of numbers in file which are written without space with text and special characters

Lets say a file has

abc[1:0]
2 abc
abc 3
[1:0] abc

I have a huge file with this . Now if i want to sum all the numbers like the below.

Note that numbers outside the bracket should not be calculated.

1+1+numberoflines

in this case 1+1+4 =6

How do i do it?

I tried number of approaches like

perl -nle '$sum+=$_} END { print $sum' test1.txt

or

n=$1
sum=0
sd=0
while [ $n -gt 0 ]
do
    sd=`expr $n % 10`
    sum=`expr $sum + $sd`
    n=`expr $n / 10`
done
echo  "Sum of digit for numner is $sum"

But none of them is taking the numbers without spaces.

Note that abc is just an example. It could be any random text along with numbers

Upvotes: 1

Views: 135

Answers (2)

etsuhisa
etsuhisa

Reputation: 1758

Using grep and sed, the following is:

echo $(( $(grep -o '\[.\+\]' test1.txt | sed -e 's/[^0-9]\+/\+/g' -e 's/^\+//g' ; cat test1.txt | wc -l) ))
  1. Extract lines with a bracket.
  2. Replace non-numbers with +.
  3. Get the number of lines in the file.
  4. Calculate as an arithmetic expression.

If brackets appear multiple times in a line, using sed instead of grep.

echo $(( $(sed -n -e '{s/.*\(\[.\+\]\).*/\1/g;T;p}' test1.txt | sed -e 's/[^0-9]\+/\+/g' -e 's/^\+//g'; cat test1.txt | wc -l) ))

Upvotes: 0

dawg
dawg

Reputation: 103714

This works as described:

echo 'abc[1:0]
2 abc
abc 3
[1:0] abc' | perl -lnE 'while (/\[([^]]*)\]/g) { 
                            $s=$1;
                            while ($s=~/\b(\d+)\b/g) {
                                $sum+=$1;
                            }
                        }   
                        END {
                        say $sum+$.
                        }
'

Prints 6

To understand it, insert some says at appropriate places:

echo 'abc[1:0]
2 abc
abc 3
[1:0] abc' | perl -lnE 'while (/\[([^]]*)\]/g) { 
                            $s=$1;
                            say $s;
                            while ($s=~/\b(\d+)\b/g) {
                                say $1;
                                $sum+=$1;
                            }
                        }   
                        END {
                        say $.;
                        say $sum+$.
                        }
'
1:0      first bracketed group from /\[([^]]*)\]/g
1        digits within from $s=~/\b(\d+)\b/g
0
1:0
1
0
4        line count from $.
6        $sum + line count

For a Python solution, you can use the same regex and do:

import re 

total=0
with open(fn) as f:        # 'fn' is the path to your file
    for i, line in enumerate(f, 1):
        if m:=re.findall(r'\[([^]]*)\]', line): 
            for e in m:
                total+=sum(map(int, re.findall(r'\b(\d+)\b', e)))
                
print(total+i)      

There are limitations here with this regex: It will not handle unbalanced or nested brackets. That is a more complicated regex.

Python note:

The := in if m:=re.findall(r'\[([^]]*)\]', line): is Python 3.9 only. Break into two statements for earlier Python versions:

m=re.findall(r'\[([^]]*)\]', line)
if m:
    ...

Upvotes: 1

Related Questions