ReLisK
ReLisK

Reputation: 55

PYTHON Re: Performance: Starting from a specific line in a text file read a line and split it based on tabs, then access each element.

As per the title the issue is i want to do the following:

  1. Starting from a specific line x up to the end of the file read each line. Nb. i dont want to use readline() as that reads the entire file to memory and when testing it is very slow on the server i deployed it to. (took like 15 minutes, whereas on my very good pc it takes 30sec).

  2. when the single line is read i want to .split(" ") that specific line and load it to a list so i can access each element.

Please see my attempt below (editted as sensitive):

with open(FileName, "w+") as file:
        file.write(FileName + "," + Quantity + "\n")
        # Start from beginning of data and read each line and take specific data   
        for x in range(StartCount,Quantity+StartCount)):
            os.chdir(FileLocation + country)
            with open(OutputFileName, 'r') as OutputFile:    
                for x, line in enumerate(OutputFile):
                    OutputFileData = [line.split("  ") for line in OutputFile]

                    #Select data you want for file from output file. Nb OutputFileData[1][:-1] removed extra part of a column
                    try:
                        FileData = OutputFileData[0]+ "," + OutputFileData[1][:-1] + "," + OutputFileData[2]

.... I then go on to append file data to the file i'm creating.

Note my code works fine when i use:

with open(OutputFileName, 'r') as OutputFile:    
                lines=OutputFile.readlines()
                temp = lines[x]
                OutputFileData = temp.split("   ")

But as i said before i believe the slowness of the script when i run it on the server is because it keeps iterating over: lines=OutputFile.readlines() which causes it to slow down.. So when i check the file im trying to create i will see it stop at an amount of lines and then it just hangs..

Please help me figure out a better way.

Upvotes: 0

Views: 106

Answers (2)

ReLisK
ReLisK

Reputation: 55

Just coming back to say the issue at the time wasn't actually my code its just that the server is really really just that slow. So i ended up having the code run on individual machines and then drop the data to the server where it needed to be. This improved performance immensely.

Upvotes: 1

MarkS
MarkS

Reputation: 1539

How about reading in N lines at a time and process those in a 'chunk', then repeat the process. Something like this:

```

textfile = "f:\\mark\\python\\test.txt"


def read_n(file, x):
    with open(file, mode='r') as fh:
        while True:
            data = ''.join(fh.readline() for _ in range(x))

            if not data:
                break

            yield data


for nlines in read_n(textfile, 5):
    print(nlines)

```

Which yields (from my simple sample file):

abc
123
def
456
ghi

789
jkl
abc
123
def

456
ghi
789
jkl
abc

123
def
456
ghi
789

jkl
abc
123
def
456

ghi
789
jkl

I am merely printing the lines in chunks, but you could perform whatever processing you are doing.

Upvotes: 0

Related Questions