vferi
vferi

Reputation: 11

Manipulating a (very) long data file with Python

I'm trying to write a code which manipulates a very long document (more than a million lines). In this text file there are certain timestamps at regular intervals (every 1003 lines) and in between there is the data I need, which is a 1000 lines long and a header and two blank lines, which I do not need.

I want my code to take an input from the user between 1 and 1000 which refers to the timestamps and copy the corresponding chunk of lines into a separate txt.

The code I've written works as expected if the input is '0', but doesn't provide any output if it's any other number.

Here is my code:

import sys

time = input()

output = open('rho_output_t' + str(time), 'w',)

sys.stdout = output

filepath = 'rho.xg'

l = 2       #lower limit of 0th interval

u = 1001    #upper limit of 0th interval

step = 1003

with open(filepath) as fp:

    for t in range(0, 1000):

        print("{} ".format(t))  #this is only here so I can see the for loop running correctly

        for cnt, line in enumerate(fp):

            if int(time) == t and cnt >= l+(step*int(time)) and cnt <= u+(step*int(time)):

                print("Line {}: {}".format(cnt, line))


output.close()

Where did I mess up and how could I correct this? Thanks for the help in advance!

Upvotes: 1

Views: 114

Answers (2)

SpghttCd
SpghttCd

Reputation: 10860

What about

filepath = 'rho.xg'
l = 2       #lower limit of 0th interval
u = 1001    #upper limit of 0th interval
step = 1003

time = int(input())
start = l + time * step

with open(filepath) as fin, open('rho_output_t' + str(time), 'w') as fout:
    for _ in range(start):
        next(fin)
    for i in range(u-1):
        line = next(fin)
        print(f'Line {start+i}: {line}')
        fout.write(line)

Upvotes: 0

Zephyrus
Zephyrus

Reputation: 364

Try:

with open(filepath) as fp:
    for t in range(0, 1000):
        print("{} ".format(t))  #this is only here so I can see the for loop running correctly
        if int(time) == t:
            for cnt, line in enumerate(fp):
                cnt >= l+(step*int(time)) and cnt <= u+(step*int(time)):
                print("Line {}: {}".format(cnt, line))

This will make sure you only look at the content of fp when you're at the correct input time, preventing it from emptying out at t==0.

Upvotes: 1

Related Questions