Borea Deitz
Borea Deitz

Reputation: 505

How can I handle multiple lines at once while reading from a file?

The standard Python approach to working with files using the open() function to create a 'file object' f allows you to either load the entire file into memory at once using f.read() or to read lines one-by-one using a for loop:

with open('filename') as f:
    # 1) Read all lines at once into memory:
    all_data = f.read()

    # 2) Read lines one-by-one:
    for line in f:
        # Work with each line

I'm searching through several large files looking for a pattern that might span multiple lines. The most intuitive way to do this is to read line-by-line looking for the beginning of the pattern, and then to load in the next few lines to see where it ends:

with open('large_file') as f:

    # Read lines one-by-one:
    for line in f:
        if line.startswith("beginning"):
            # Load in the next line, i.e.
            nextline = f.getline(line+1)  # ??? #
            # or something

The line I've marked with # ??? # is my own pseudocode for what I imagine this should look like.

My question is, does this exist in Python? Is there any method for me to access other lines as needed while keeping the cursor at line and without loading the entire file into memory?

Edit Inferring from the responses here and other reading, the answer is "No."

Upvotes: 1

Views: 1108

Answers (3)

fsimonjetz
fsimonjetz

Reputation: 5802

I think you're looking for .readline(), which does exactly that. Here is a sketch to proceed to the line where a pattern starts.

with open('large_file') as f:
    line = f.readline()

    while not line.startswith("beginning"):
        line = f.readline()
        
        # end of file
        if not line:
            print("EOF")
            break
    
    # do_something with line, get additional lines by 
    # calling .readline() again, etc.

Upvotes: 1

Patrick Artner
Patrick Artner

Reputation: 51653

Just store the interesting lines into a list while going line-wise through the file:

with open("file.txt","w") as f:
    f.write("""
a
b
------    
c
d
e
####
g
f""")

interesting_data = []
inside = False
with open ("file.txt") as f:
    for line in f:
        line = line.strip()
        # start of interesting stuff
        if line.startswith("---"):
            inside = True

        # end of interesting stuff
        elif line.startswith("###"):
            inside = False

        # adding interesting bits
        elif inside:
            interesting_data.append(line)

print(interesting_data)            

to get

['c', 'd', 'e']

Upvotes: 1

Tim Roberts
Tim Roberts

Reputation: 54708

Like this:

gather = []
for line in f:
    if gather:
        gather.append(line)
        if "ending" in line:
            process( ''.join(gather) )
            gather = []       
    elif line.startswith("beginning"):
        gather = [line]

Although in many cases it's easier just to load the whole file into a string and search it.

You may want to rstrip the newline before appending the line.

Upvotes: 2

Related Questions