Reputation: 89
Program workflow:
CURRENT CODE:
result = [line.split("\n")[0] for line in open('asigra_backup.txt') if re.match('^Errors:\s([1-9]|[1-9][0-9]|100)',line)]
print(result)
CURRENT OUTPUT:
['Errors: 1', 'Errors: 128']
DESIRED OUTPUT:
Errors: 1
Pasta
Fish
Dog
Doctonr
Errors: 128
Lemon
Seasoned
Rhinon
Goat
SAMPLE .TXT FILE
Errors: 1
Pasta
Fish
Dog
Doctonr
Errors: 128
Lemon
Seasoned
Rhinon
Goat
Errors: 0
Rhinon
Cat
Dog
Fish
Upvotes: 1
Views: 126
Reputation: 89
For those wanting additional clarification, as it may help the next person, this was my final solution:
def errors_to_file(self):
"""
Opens file containing Asigra backup logs, "asigra_backup.txt", and returns a list of all errors within the log.
Uses a regular expression match conditional on each line within the asigra backup log file. Error number range is 1 - 100.
Formats errors log by appending a space every 10th element in the errors log list.txt
Writes formatted error log to a file in current directory: "asigra_errors.txt"
"""
# "asigra_backup.txt" contains log information from the performed backup.
with open('asigra_backup.txt', "r") as f:
lines0 = [line.rstrip() for line in f]
# empty list that is appended with errors found in the log
lines = []
for i, line in enumerate(lines0):
if re.match('^Errors:\s([1-9]|[1-9][0-9]|100)',line):
lines.extend(lines0[i:i+9])
if len(lines) == 0:
print("No errors found")
print("Gracefully exiting")
sys.exit(1)
k = ''
N = 9
formatted_errors = list(chain(*[lines[i : i+N] + [k]
if len(lines[i : i+N]) == N
else lines[i : i+N]
for i in range(0, len(lines), N)]))
with open("asigra_errors.txt", "w") as e:
for i, line in enumerate(formatted_errors):
e.write(f"{line}\n")
Huge thank you to those that answered my question.
Upvotes: 1
Reputation: 463
I wrote a code which prints the output as requested. The code will work when Errors: 1
line is added as last line. See the text I have parsed:
data_to_parse = """
Errors: 56
Pasta
Fish
Dog
Doctonr
Errors: 0
Lemon
Seasoned
Rhinon
Goat
Errors: 45
Rhinon
Cat
Dog
Fish
Errors: 34
Rhinon
Cat
Dog
Fish1
Errors: 1
"""
See the code which gives the desired output without using regex. Indices have been used to get desired data.
lines = data_to_parse.splitlines()
errors_indices = []
i = 0
k = 0
for line in lines: # where Errors: are located are found in saved in list errors_indices.
if 'Errors:' in line:
errors_indices.append(i)
i = i+1
#counter = False
while k < len(errors_indices):
counter = False # It is needed to find the indices when Errors: 0 is hit.
for j in range(errors_indices[k-1], errors_indices[k]):
if 'Errors:' in lines[j]:
lines2 = lines[j].split(':')
lines2_val = lines2[1].strip()
if int(lines2_val) != 0:
print(lines[j])
if int(lines2_val) == 0:
counter = True
elif 'Errors:' not in lines[j] and counter == False:
print(lines[j])
k=k+1
I have tried a few times to see if the code is working properly. It looks it gives the requested output properly. See the output when the code is run as:
Upvotes: 0
Reputation: 1525
Using better regex and re.findall
can make it easier. In the following regex, all Errors:
and 4 following lines are detected.
import re
regex_matches = re.findall('(?:[\r\n]+|^)((Errors:\s*([1-9][0-9]?|100))(?:[\r\n\s\t]+.*){4})', open('asigra_backup.txt', 'r').read())
open('separate.txt', 'a').write('\n' + '\n'.join([i[0] for i in regex_matches]))
To access error numbers or error lines following lines can use:
error_rows = [i[1] for i in regex_matches]
error_numbers = [i[2] for i in regex_matches]
print(error_rows)
print(error_numbers)
Upvotes: 0