Reputation: 111
i have a file.txt
that look like this.
testings 1
response 1-a
time 32s
testings 2
response 2-a
time 32s
testings 3
*blank*
testings 4
error
testings 5
response 5-a
time 26s
and prints
['testings 1', 'testings 2', 'testings 3', 'testings 4', 'testings 5']
['response 1-a', 'response 2-a', 'response 5-a']
['time 32s', 'time 20s', 'time 26s']
So it´s a simpel code i have, it opens the file, uses readlines()
and looks for the keywords testings
,response
and time
then appends the string to 3 seperat lists. As shown in the file.txt
some testings x
are either *blank*
or has an error
instead off a response
. My problem is that i need the lists to always have the same lenght. Like this:
['testings 1', 'testings 2', 'testings 3', 'testings 4', 'testings 5']
['response 1-a', 'response 2-a', '*error*', '*error*', 'response 5-a']
['time 32s', 'time 20s', '*error*', '*error*', 'time 26s']
So i was thinking if it´s posbile to "read for 3 lines at the same time" and have a if-statment where all the 3 lines need to have the right keywords ("be True") or else insert *error*
in the response and time list to keep the lenght right. Or is there even a better way to keep 3 list at the same lenght?
test = []
response = []
time =[]
with open("textfile.txt",'r') as txt_file:
for line in txt_file.readlines():
if ("testings") in line:
test.append(line.strip())
if ("response") in line:
response.append(line.strip())
if ("time") in line:
time.append(line.strip())
print (response)
print (test)
print (time)
Upvotes: 0
Views: 76
Reputation: 155
This snippet does what you are seeking. You can use next(txt_file, '')
to retrieve the next line without having to load the file into memory first. Then, you look only for lines that contain "testing", and when you do, you compare the next two lines. it will always add one string to each list, any time it finds "testing", however, if it doesn't find "response" or "time" then it will insert errors where appropriate. Here is the code, using the input you gave above.
with open("textfile.txt", "r") as txt_file:
test = []
response = []
time = []
for line in txt_file:
if "testings" in line:
test_line = line.strip()
response_line = next(txt_file, '').strip()
time_line = next(txt_file, '').strip()
test.append(test_line)
if "response" in response_line:
response.append(response_line)
else:
response.append("*error*")
if "time" in time_line:
time.append(time_line)
else:
time.append("*error*")
And the Output:
In : test
Out: ['testings 1', 'testings 2', 'testings 3', 'testings 4', 'testings 5']
In : response
Out: ['response 1-a', 'response 2-a', '*error*', '*error*', 'response 5-a']
In : time
Out: ['time 32s', 'time 32s', '*error*', '*error*', 'time 26']
In : len(test), len(response), len(time)
Out: (5, 5, 5)
Upvotes: 1
Reputation: 1124110
Text file are iterables, meaning you can loop over them directly, or you can use the next()
function to get another line from them. The file object will always produce the next line in the file whatever method you are using, even when mixing techniques.
You can use this to pull in more lines in a for
loop:
with open("textfile.txt",'r') as txt_file:
for line in txt_file:
line = line.strip()
if line.startswith('testings'):
# expect two more lines, response and time
response_line = next(txt_file, '')
if not response_line.startswith('response'):
# not a valid block, scan forward to the next testings
continue
time_line = next(txt_file, '')
if not time_line.startswith('time'):
# not a valid block, scan forward to the next testings
continue
# valid block, we got our three elements
test.append(line)
response.append(response_line.strip())
time.append(time_line.strip())
So when a line starting with testings
is found, the code pulls in the next line. If that line starts with response
, another line is pulled in. If that line starts with time
, then all three lines are appended to your data structures. If neither of those two conditions are met, the the outer for
loop is continued and reading the file continues until another testings
line is found.
The added bonus is that the file is never read into memory in one go. File buffering keeps this efficient, but otherwise you never need more memory than is needed for the final set of lists (valid data), and the three lines currently being tested.
Side note: I'd strongly recommend you do not use three separate lists of equal length. You could just use a single list with tuples:
test_data = []
# ... in the loop ...
test_data.append((line, response_line.strip(), time_line.strip()))
and then use that single list to keep each triplet of information together. You can even use a named tuple:
from collections import namedtuple
TestEntry = namedtuple('TestEntry', 'test response time')
# ... in the loop
test_data.append(TestEntry(line, response_line.strip(), time_line.strip()))
at which point each entry in the test_data
list is an object with test
, response
and time
attributes:
for entry in test_data:
print(entry.test, entry.response, entry.time)
Upvotes: 2
Reputation: 1747
From the answer here
from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
with open("textfile.txt",'r') as txt_file:
for batch in grouper(txt.readlines, 3):
if ("testings") in batch[0]:
test.append(line.strip())
else:
test.append('error')
if ("response") in batch[1]:
response.append(line.strip())
else:
response.append('error')
if ("time") in batch[2]:
time.append(line.strip())
else:
time.append('error')
This assumes there will always be the lines in the same order, and that the file is always organised in batches of three lines, even if that is just a blank line. Since it actually looks like your input file has a blank line between each group of 3 you may need to change grouper to read batches of 4.
Upvotes: 0