HindK
HindK

Reputation: 142

Python - Is there a way to trace back the current position of a file object by one line

I have a test file (not python script) that contains multiple sequences of the form:

testFile (not python script)

#Gibberish
#Gibberish
newSeq name-and-details
10 something
20 something
30 something
newSeq name-and-details
10 something
20 something
30 something
#Gibberish
#Gibberish
newSeq name-and-details
...and so forth

Then, I have a python script that reads this file as input. For each new sequence, a new python-list is created to store the contents.

inputFile = open('testFile','r')
moreSeq = True
newLine = inputFile.readline()
while moreSeq:
  while (not ('newSeq' in newLine)):
    newLine = inputFile.readline()
  newList = []
  moreSeq = newList.listEntry(inputFile)
  listDB.append(newList)

But when the file object inputFile is passed to the listEntry method, I wish its position would point to the beginning of newSeq and not the subsequent index:

i.e. I wish it point to newSeq #1 line, rather than 10 something.

How can I trace back the position of file object by one line, or by a fixed measure in lines. I believe seek doesn't work in this case.

Upvotes: 1

Views: 1016

Answers (4)

deStrangis
deStrangis

Reputation: 1930

This is a common problem that is normally solved by unreading the line as in the following code:

class SmartReader(object):
    def __init__(self, file):
       self.file = file
       self.lastline = None
    def readline(self):
       if self.lastline is not None:
          ln = self.lastline
          self.lastline = None
          return ln
       return self.file.readline()
    def unreadline(self, line):
       self.lastline = line           


     ...


    fd = SmartReader(open("file.txt"))
    readMore = True
    while readMore:
       line = fd.readline()
       if its_newSeq():
          fd.unreadline(line)
          close_the_previous_sequence()
       else:
          process_the_line()

Upvotes: 2

chthonicdaemon
chthonicdaemon

Reputation: 19770

A direct solution to the problem may be to use itertools.chain, by doing

moreSeq = newList.listEntry(itertools.chain([newline], inputFile))

That way, the listEntry method sees an iterable consistent with what you described. However, I suspect that this will not solve the problem you have when listEntry parses the lines and returns - you probably want to rewind the file again when that happens, as listEntry will probably be consuming one of the #Gibberish lines as well.

I must say that your code reads a bit more like C than Python. I think the line reading loop would be more legible as a for line in f style loop. It may be a better idea to rethink your approach to align better with the language.

Upvotes: 0

user2109788
user2109788

Reputation: 1336

I think same can be achieved with the following:

lists = []
with open('testFile','r') as f:
    for line in f:
        if '#Gib' in line:
            pass
        elif 'newSeq' in line:
            lists.append([])
        else:
            lists[-1].append(line)

This will return list of lists which has the required lines. You can use any data structure you want. If newSeq name-and-details is unique then I would prefer list of hashes would be a better data structure.

Upvotes: 1

omu_negru
omu_negru

Reputation: 4770

You can use file.tell() to see the current position in bytes in the file and file.seek() to position the cursor to an arbitrary new position. With these 2 methods and the length of the line you just read it should be easy enough to do what you intend

f = open('foo.txt')
f.readline() # output `bar`
f.tell() # output 3
f.seek(0) # go to the start of the file

Upvotes: 1

Related Questions