Reputation: 142
I have a test file (not python script) that contains multiple sequences of the form:
testFile (not python script)
#Gibberish
#Gibberish
newSeq name-and-details
10 something
20 something
30 something
newSeq name-and-details
10 something
20 something
30 something
#Gibberish
#Gibberish
newSeq name-and-details
...and so forth
Then, I have a python script that reads this file as input. For each new sequence, a new python-list is created to store the contents.
inputFile = open('testFile','r')
moreSeq = True
newLine = inputFile.readline()
while moreSeq:
while (not ('newSeq' in newLine)):
newLine = inputFile.readline()
newList = []
moreSeq = newList.listEntry(inputFile)
listDB.append(newList)
But when the file object inputFile is passed to the listEntry method, I wish its position would point to the beginning of newSeq and not the subsequent index:
i.e. I wish it point to newSeq #1 line, rather than 10 something.
How can I trace back the position of file object by one line, or by a fixed measure in lines. I believe seek doesn't work in this case.
Upvotes: 1
Views: 1016
Reputation: 1930
This is a common problem that is normally solved by unreading the line as in the following code:
class SmartReader(object):
def __init__(self, file):
self.file = file
self.lastline = None
def readline(self):
if self.lastline is not None:
ln = self.lastline
self.lastline = None
return ln
return self.file.readline()
def unreadline(self, line):
self.lastline = line
...
fd = SmartReader(open("file.txt"))
readMore = True
while readMore:
line = fd.readline()
if its_newSeq():
fd.unreadline(line)
close_the_previous_sequence()
else:
process_the_line()
Upvotes: 2
Reputation: 19770
A direct solution to the problem may be to use itertools.chain
, by doing
moreSeq = newList.listEntry(itertools.chain([newline], inputFile))
That way, the listEntry
method sees an iterable consistent with what you described. However, I suspect that this will not solve the problem you have when listEntry
parses the lines and returns - you probably want to rewind the file again when that happens, as listEntry
will probably be consuming one of the #Gibberish
lines as well.
I must say that your code reads a bit more like C than Python. I think the line reading loop would be more legible as a for line in f
style loop. It may be a better idea to rethink your approach to align better with the language.
Upvotes: 0
Reputation: 1336
I think same can be achieved with the following:
lists = []
with open('testFile','r') as f:
for line in f:
if '#Gib' in line:
pass
elif 'newSeq' in line:
lists.append([])
else:
lists[-1].append(line)
This will return list of lists which has the required lines. You can use any data structure you want. If newSeq name-and-details is unique then I would prefer list of hashes would be a better data structure.
Upvotes: 1
Reputation: 4770
You can use file.tell()
to see the current position in bytes in the file and file.seek()
to position the cursor to an arbitrary new position. With these 2 methods and the length of the line you just read it should be easy enough to do what you intend
f = open('foo.txt')
f.readline() # output `bar`
f.tell() # output 3
f.seek(0) # go to the start of the file
Upvotes: 1