Reputation: 149
sorry if this is very noob question, but I have tried to solve this on my own for some time, gave it a few searches (used the "map" function, etc.) and I did not find a solution to this. Maybe it's a small mistake somewhere, but I am new to python and seem to have some sort of tunnel vision.
I have some text (see sample) that has numbers inbetween. I want to extract all numbers with regular expressions into a list and then sum them. I seem to be able to do the extraction, but struggle to convert them to integers and then sum them.
import re
df = ["test 4497 test 6702 test 8454 test",
"7449 test"]
numlist = list()
for line in df:
line = line.rstrip()
numbers = re.findall("[0-9]+", line) # find numbers
if len(numbers) < 1: continue # ignore lines with no numbers, none in this sample
numlist.append(numbers) # create list of numbers
The sum(numlist) returns an error.
Upvotes: 0
Views: 2777
Reputation: 44013
This is the source of your problem:
finadall
returns a list which you are appending to numlist
, a list. So you end up with a list of lists. You should instead do:
numlist.extend(numbers)
So that you end up with a single list of numbers (well, actually string representations of numbers). Then you can convert the strings to integers and sum:
the_sum = sum(int(n) for n in numlist)
Upvotes: 2
Reputation: 2569
Here is an option using map
, filter
and sum
:
First splits the strings at the spaces, filters out the non-numbers, casts the number-strings to int and finally sums them.
# if you want the sum per string in the list
sums = [sum(map(int, filter(str.isnumeric, s.split()))) for s in df]
# [19653, 7449]
# if you simply want the sum of all numbers of all strings
sum(sum(map(int, filter(str.isnumeric, s.split()))) for s in df)
# 27102
Upvotes: 1
Reputation: 18106
Iterate twice over df
and append each digit to numlist
:
numlist = list()
for item in df:
for word in item.split():
if word.isnumeric():
numlist.append(int(word))
print(numlist)
print(sum(numlist))
Out:
[4497, 6702, 8454, 7449]
27102
You could make a one-liner using list comprehension:
print(sum([int(word) for item in df for word in item.split() if word.isnumeric()]))
>>> 27102
Upvotes: 2
Reputation: 88226
You don't need a regex for this. Split the strings in the list, and sum
those that are numeric in a comprehension:
sum(sum(int(i) for i in s.split() if i.isnumeric()) for s in df)
# 27102
Or similarly, flatten the resulting lists, and sum
once:
from itertools imprt chain
sum(chain.from_iterable((int(i) for i in s.split() if i.isnumeric()) for s in df))
# 27102
Upvotes: 3