Reputation: 3710
I have to create a big csv export file of more than 400 MB with PHP. First drafts of the export file and the PHP code allow some guesses about the performance.
To avoid extremely long processing times, I should focus on creating the export file efficiently and avoid PHP array
-operations as they are too slow in that case. "Create the file efficiently" means: append big blocks of text to other big blocks in the file, each big block being created quickly.
Unfortunately, the "big blocks" are rather rectangles than lines. Building my export file will start with lots of line beginnings, like this:
Title a, Title b, Title c \n
"2014", "07", "01" \n
"2014", "07", "02" \n
...
Then I would have to to add a "rectangle" of text to the right of the line beginnings:
Title a, Title b, Title c, extention 1, extention 2, extention 3 \n
"2014", "07", "01", "23", "1", "null" \n
"2014", "07", "02", "23", "1", "null" \n
...
If I have to do this line by line, it will slow me down again. So I'm hoping for a way to add "rectangles" in a file, just as you can in some text editors. Also helpful would be a concrete experience with huge text buffers in PHP, could also work.
As it is not my hosting, I'm not sure if I have permissions to invoke sed/akw.
So the question is: Can sb advice from experience how to handle big csv files in PHP efficiently (file block operations, file "rectangle" operations) or just how to handle big string buffers in PHP efficiently? There seems to be no framework for string buffers.
Thank you for your attention :-)
Note: This is not a duplicate of this: https://stackoverflow.com/questions/19725129/creating-big-csv-file-in-windows-apache2-php
Upvotes: 0
Views: 2817
Reputation: 3710
Encouraged by the answers/comments to my question, I've written a short benchmark test.
Section a) creates 2 files each with 1 million lines, each line with 100 chars. It then merges them into a 3rd file, line by line like a zipper:
line1_1 line2_1
line1_2 line2_2
line1_3 line2_3
That's what Raphael Müller suggested.
Section b) fills 1 million rows (same size as in section 1) into a MySQL table with two columns. It fills the first column first, by 1 million insert statements. Then, with one update statement, it fills the second column. Like this, I would have used one command for handling several rows in one step ("rectangular" action as described in the question). In the table would then be the merged data file ready to readout and download.
That's what Florin Asavoaie suggested.
In order to fill 1 file with 1 million lines each line 100 chars, it takes 4.2 seconds. In order to merge both files into a 3rd file, it takes 10 seconds.
In order to fill a MySQL table with 1 million lines each line 100 chars by single insert statements, it takes 440 seconds. So I haven't measured the second step.
This is not a final conclusion in general about the performance of databases or file systems. Probably, the database could be optimized with some freedom at the hosting (which I don't have).
I think for now it is somewhat safe to assume this performance order:
Which means, if your RAM is bursting at the seams because you create an export file, don't hesitate to write it in parts to files and merge them without much effort to maintain memory blocks.
PHP is not the language to offer sophisticated low level memory block handling. But finally, you won't need it.
Upvotes: 1
Reputation: 918
Just put all the data into some SQL (even SQLite would be more than fine for this purpose) and then export it as CSV.
Upvotes: 5