Reputation: 371
I am reading a big file(10M - 30M records) line by line, performs some data manipulation to each line and writes to another file. I have a scenario where first line manipulation depends on some of the data on other lines and hence I can write the first line to the output file only after traversing through all other lines.
I have tried with fileinput
like below:
with fileinput.input(temp_file_name, inplace=True) as file_inp:
for file_line in file_inp:
sys.stdout.write(file_line.replace('Header', f"{transformed_header_line}"))
Here, temp_file_name is the transformed file with first line as Header
with all other transformed lines and using fileinput
I am replacing string Header with new line and again writing to the file.
This process is taking time. Are there any other alternative methods like writing to stream of bytes or generator and later modify the data and write to the file.
Upvotes: 2
Views: 66
Reputation: 11
If you are on ubuntu then you can use sed
command in a subprocess as that is much faster.
subprocess.run(f"sed -i 's/Header/{transformed_header_line}/g' {temp_file_name}", shell=True)
Upvotes: 1