Reputation: 59
I'm running Python 3.6.8. I need to sum values that appear in a log file. The line may contain 1 to 14 {index,value} pairs; a typical line for 8 values is in the code below(variable called 'log_line'). The line format with the '- -' separator is consistent. I have working code, but I'm not sure if this is the most elegant or best way to parse this string; it feels a bit clunky. Any suggestions?
import re
#verion 1
log_line = 'Some explanatory text was here: - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'
log_line_values = log_line.split('- -')[1]
values = re.findall(r'{\d+,\s\d+}',log_line_values)
sum_of_values = 0
for v in values:
sum_of_values += int(v.replace('{','').replace('}','').replace(' ','').split(',')[1])
print(f'1) sum_of_values:{sum_of_values}')
#verions 2, essentially the same, but more concise (some may say confusing)
sum_of_values = sum([int(v.replace('{','').replace('}','').replace(' ','').split(',')[1]) for v in re.findall(r'{\d+,\s\d+}',log_line.split('- -')[1])])
print(f'2) sum_of_values:{sum_of_values}')
Upvotes: 0
Views: 104
Reputation: 2841
Assuming you've already identified that the line is one that matches the pattern, you can simplify your logic a lot by using a generator expression within sum().
import re
# Compile your regular expression for reuse
# Just pull out the last value from each pair
re_extract_val = re.compile(r'{\d+, (\d+)}')
log_line = 'Some explanatory text was here: - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'
# Use generator comprehension within sum() to add all values
sum_of_values = sum(int(val) for val in re_extract_val.findall(log_line))
You could also use map(), but I find it's clearer with a generator expression
sum_of_values = sum(map(int, re_extract_val.findall(log_line)))
Upvotes: 1
Reputation: 1451
Ideal use case for regular expressions capture groups:
import re
log_line = 'Some explanatory text was here: - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'
pattern = r'{(\d+), (\d+)}'
s = sum([int(e[1]) for e in re.findall(pattern, log_line.split('- -')[1])])
print(s) # 95
Here I use re.findall
to match numbers from input array and use list comprehension to convert them to numbers and sum.
The advantage of using {(\d+), (\d+)}
pattern is the ability to extract first number too (if you need it).
Upvotes: 0
Reputation: 2579
First, no need to get rid of the prefix - the regex will take care of not matching that. Second, we can use capturing groups to capture values that we only care about. In our case, the second value in a comma seperated pair. We can use map(int, iterable)
to turn every string to an int in a list, and then we can use sum on that list of numbers.
Putting it all together:
import re
log_line = 'Some explanatory text was here: - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'
values = re.findall(r'{\d+,\s(\d+)}', log_line_values)
sum_of_values = sum(map(int, values))
Upvotes: 2