Reputation: 55
As per the title the issue is i want to do the following:
Starting from a specific line x up to the end of the file read each line. Nb. i dont want to use readline() as that reads the entire file to memory and when testing it is very slow on the server i deployed it to. (took like 15 minutes, whereas on my very good pc it takes 30sec).
when the single line is read i want to .split(" ") that specific line and load it to a list so i can access each element.
Please see my attempt below (editted as sensitive):
with open(FileName, "w+") as file:
file.write(FileName + "," + Quantity + "\n")
# Start from beginning of data and read each line and take specific data
for x in range(StartCount,Quantity+StartCount)):
os.chdir(FileLocation + country)
with open(OutputFileName, 'r') as OutputFile:
for x, line in enumerate(OutputFile):
OutputFileData = [line.split(" ") for line in OutputFile]
#Select data you want for file from output file. Nb OutputFileData[1][:-1] removed extra part of a column
try:
FileData = OutputFileData[0]+ "," + OutputFileData[1][:-1] + "," + OutputFileData[2]
.... I then go on to append file data to the file i'm creating.
Note my code works fine when i use:
with open(OutputFileName, 'r') as OutputFile:
lines=OutputFile.readlines()
temp = lines[x]
OutputFileData = temp.split(" ")
But as i said before i believe the slowness of the script when i run it on the server is because it keeps iterating over: lines=OutputFile.readlines() which causes it to slow down.. So when i check the file im trying to create i will see it stop at an amount of lines and then it just hangs..
Please help me figure out a better way.
Upvotes: 0
Views: 106
Reputation: 55
Just coming back to say the issue at the time wasn't actually my code its just that the server is really really just that slow. So i ended up having the code run on individual machines and then drop the data to the server where it needed to be. This improved performance immensely.
Upvotes: 1
Reputation: 1539
How about reading in N lines at a time and process those in a 'chunk', then repeat the process. Something like this:
```
textfile = "f:\\mark\\python\\test.txt"
def read_n(file, x):
with open(file, mode='r') as fh:
while True:
data = ''.join(fh.readline() for _ in range(x))
if not data:
break
yield data
for nlines in read_n(textfile, 5):
print(nlines)
```
Which yields (from my simple sample file):
abc
123
def
456
ghi
789
jkl
abc
123
def
456
ghi
789
jkl
abc
123
def
456
ghi
789
jkl
abc
123
def
456
ghi
789
jkl
I am merely printing the lines in chunks, but you could perform whatever processing you are doing.
Upvotes: 0