Reputation: 40
Lets say a file has
abc[1:0]
2 abc
abc 3
[1:0] abc
I have a huge file with this . Now if i want to sum all the numbers like the below.
Note that numbers outside the bracket should not be calculated.
1+1+numberoflines
in this case 1+1+4 =6
How do i do it?
I tried number of approaches like
perl -nle '$sum+=$_} END { print $sum' test1.txt
or
n=$1
sum=0
sd=0
while [ $n -gt 0 ]
do
sd=`expr $n % 10`
sum=`expr $sum + $sd`
n=`expr $n / 10`
done
echo "Sum of digit for numner is $sum"
But none of them is taking the numbers without spaces.
Note that abc is just an example. It could be any random text along with numbers
Upvotes: 1
Views: 135
Reputation: 1758
Using grep and sed, the following is:
echo $(( $(grep -o '\[.\+\]' test1.txt | sed -e 's/[^0-9]\+/\+/g' -e 's/^\+//g' ; cat test1.txt | wc -l) ))
If brackets appear multiple times in a line, using sed instead of grep.
echo $(( $(sed -n -e '{s/.*\(\[.\+\]\).*/\1/g;T;p}' test1.txt | sed -e 's/[^0-9]\+/\+/g' -e 's/^\+//g'; cat test1.txt | wc -l) ))
Upvotes: 0
Reputation: 103714
This works as described:
echo 'abc[1:0]
2 abc
abc 3
[1:0] abc' | perl -lnE 'while (/\[([^]]*)\]/g) {
$s=$1;
while ($s=~/\b(\d+)\b/g) {
$sum+=$1;
}
}
END {
say $sum+$.
}
'
Prints 6
To understand it, insert some say
s at appropriate places:
echo 'abc[1:0]
2 abc
abc 3
[1:0] abc' | perl -lnE 'while (/\[([^]]*)\]/g) {
$s=$1;
say $s;
while ($s=~/\b(\d+)\b/g) {
say $1;
$sum+=$1;
}
}
END {
say $.;
say $sum+$.
}
'
1:0 first bracketed group from /\[([^]]*)\]/g
1 digits within from $s=~/\b(\d+)\b/g
0
1:0
1
0
4 line count from $.
6 $sum + line count
For a Python solution, you can use the same regex and do:
import re
total=0
with open(fn) as f: # 'fn' is the path to your file
for i, line in enumerate(f, 1):
if m:=re.findall(r'\[([^]]*)\]', line):
for e in m:
total+=sum(map(int, re.findall(r'\b(\d+)\b', e)))
print(total+i)
There are limitations here with this regex: It will not handle unbalanced or nested brackets. That is a more complicated regex.
Python note:
The :=
in if m:=re.findall(r'\[([^]]*)\]', line):
is Python 3.9 only.
Break into two statements for earlier Python versions:
m=re.findall(r'\[([^]]*)\]', line)
if m:
...
Upvotes: 1