Twisted1919
Twisted1919

Reputation: 2499

Export large CSV file

I have a mysql table where each record can have unlimited custom fields (EAV model, doesn't matter this) and each field can have unlimited options and each option can have unlimited values.
Right now i am trying to built a export tool that will export all these custom fields with their values, that is: name => value pairs for each field. That's not the important part, it's here just to highlight that we're talking about a lot of mysql queries for a single record and that the size of the export will be pretty large.

For each row from my main table i must do around 100 separate sql queries to get the fields, fields options and field options values. These queries are pretty fast because they all use the right indexes, but still we're talking about 100 queries for a single record and i expect to have around 50k records in my main table just to start with.

Right now, what i do is:

set_time_limit(0);
ini_set('memory_limit', '1G');
ini_set("auto_detect_line_endings", true);

$count = $export->count();
$date = date('Y-m-d-H-i-s');
$fileName = CHtml::encode($export->name) .'-'. $date . '.csv';

$processAtOnce = 100;
$rounds = round($count / $processAtOnce);

header("Content-disposition: attachment; filename={$fileName}");
header("Content-Type: text/csv");

$headerSet = false;
for ($i = 0; $i < $rounds; ++$i) {

    $limit = $processAtOnce;
    $offset = $i * $processAtOnce;
    $rows = $export->find($limit, $offset);

    if (empty($rows)) {
        continue;
    }

    $outStream = fopen('php://output', 'w');

    if (!$headerSet) {
        fputcsv($outStream, array_keys($rows[0]), ',', '"');    
        $headerSet = true;
    }

    foreach ($rows as $row) {
        fputcsv($outStream, array_values($row), ',', '"');
    }

    echo fgets($outStream);

    fclose($outStream);
}

Basically i count all the records and i "paginate" them for export, then run through the pages to avoin loading too many sql results at once.
I am wondering if this is a valid approach? Any thoughts?

My alternative would be to count all the records, split them into "pages" and for each page do an ajax request(recursive function called after the previous request has been made successfully). When doing the ajax request, process maybe 1k records at once(these 1k would also be splitted like in the above example, run internally 10 times with 100 results for example), write them into a temporary directory(like part-1.csv, part-2.csv) and at the end when all the records are processed, create an archive from the folder containing all the csv parts and force the browser to download it then remove it from the server(window.location.href from within the last ajax call).
Is this a good alternative to the above?

Please note, my goal is to limit the amount of memory usage that's why i think the second approach would help me more.

Please let me know what you think.
Thanks.

Upvotes: 6

Views: 23806

Answers (3)

Vivek Chaturvedi
Vivek Chaturvedi

Reputation: 540

Below is the more optimized approach to Export large CSV file ( thanks to @Joe for above code ) -

  1. Make Ajax request in loops to the server. Below will be the AJAX call process.
  2. The server will read first records in batches of records ( chunkSize ) at a time to avoid getting too many results back at once from MySQL.
  3. The file exported_file.csv will be open in write mode in the first request and append mode in subsequent requests.
  4. The results are written to this file. When there are no more records to add to the file, the js function will send the file to download.

Below is the example JS function -

<script>
var exportedRecords = 0;
var chunkSize = 500; // as per query performance

for( start=0; start <= totalRecords; start += chunkSize){
    chunkCSVExport(,0, chunkSize);
}

function chunkCSVExport(start,chunkSize){
        requestData['start']  = start;
        requestData['limit']  = chunkSize;
        jQuery.ajax({
            type : "post",
            dataType : "json",
            url :  action,
            data : formData,
            success: function(response) {
                console.log(response);
                exportedRecords += chunkSize;
                downloadfile();
            }
        });
    }

function downloadfile(){   
         if(exportedRecords>=totalRecords){
                     // call download file function here
                }
}

</script>

Below is example PHP code -

<?php
$start = $_POST['start']; //added the missing closing single quote 
$limit = $_POST['limit'];


if($start==0) {
    $handle = fopen( 'file-export.csv', 'w' );
}else{
    $handle = fopen( 'file-export.csv', 'a' );
}


// Run The query from start to limit
$results = getresults($query)

if($start==0) {
    $headerDisplayed = false;
}else{
    $headerDisplayed = true;
}

foreach ( $results as $data ) {
    // Add a header row if it hasn't been added yet
    if ( !$headerDisplayed ) {
        // Use the keys from $data as the titles
        fputcsv($handle, $arrHeaders);
        $headerDisplayed = true;
    }
    // Put the data into the stream
    fputcsv($handle, $data);
}

// Close the file
fclose($handle);

// Output some stuff for jquery to use
$response = array(
    'result'        => 'success'
);
echo json_encode($response);
exit;
?>

Upvotes: 3

Joe
Joe

Reputation: 3151

Thanks for the post Twisted1919 gave me some inspiration. I know this post is a bit old but I thought I would post some code of my solution so far in case it helps anyone else.

It's using some Wordpress functions for the DB queries.

I am replacing your steps 3 and 4 with.

<?php
// if its a fist run truncate the file. else append the file
if($start==0) {
    $handle = fopen( 'temp/prod-export'. '.csv', 'w' );
}else{
    $handle = fopen( 'temp/prod-export'. '.csv', 'a' );
}
?>

Some basic jQuery

<script>
    // do stuff on the form submit
    $('#export-form').submit(function(e){
        e.preventDefault();
        var formData = jQuery('#export-form').serializeObject();
        var chunkAndLimit = 1000;
        doChunkedExport(0,chunkAndLimit,formData,$(this).attr('action'),chunkAndLimit);
    });
    // function to trigger the ajax bit
    function doChunkedExport(start,limit,formData,action,chunkSize){
        formData['start']  = start;
        formData['limit']  = limit;
        jQuery.ajax({
            type : "post",
            dataType : "json",
            url :  action,
            data : formData,
            success: function(response) {
                console.log(response);
                if(response.result=='next'){
                    start = start + chunkSize;
                    doChunkedExport(start,limit,formData,action,chunkSize);
                }else{
                    console.log('DOWNLOAD');
                }
            }
        });
    }
    // A function to turn all form data into a jquery object
    jQuery.fn.serializeObject = function(){
        var o = {};
        var a = this.serializeArray();
        jQuery.each(a, function() {
            if (o[this.name] !== undefined) {
                if (!o[this.name].push) {
                    o[this.name] = [o[this.name]];
                }
                o[this.name].push(this.value || '');
            } else {
                o[this.name] = this.value || '';
            }
        });
        return o;
    };
</script>

The php bit

<?php
global $wpdb;

$postCols = array(
    'post_title',
    'post_content',
    'post_excerpt',
    'post_name',
);

header("Content-type: text/csv");

$start = intval($_POST['start']);
$limit = intval($_POST['limit']);

// check the total results to workout the finish point
$query = "SELECT count(ID) as total FROM `wp_posts` WHERE post_status = 'publish';";
$results = $wpdb->get_row( $query, ARRAY_A );
$totalResults = $results['total'];
$result = 'next';
if( ($start + $limit ) >= $totalResults){
    $result = 'finished';
}

// if its a fist run truncate the file. else append the file
if($start==0) {
    $handle = fopen( 'temp/prod-export'. '.csv', 'w' );
}else{
    $handle = fopen( 'temp/prod-export'. '.csv', 'a' );
}

$cols = implode(',',$postCols);
//The query
$query = "SELECT {$cols} FROM `wp_posts` WHERE post_status = 'publish' LIMIT {$start},{$limit};";
$results = $wpdb->get_results( $query, ARRAY_A );

if($start==0) {
    $headerDisplayed = false;
}else{
    $headerDisplayed = true;
}

foreach ( $results as $data ) {
    // Add a header row if it hasn't been added yet
    if ( !$headerDisplayed ) {
        // Use the keys from $data as the titles
        fputcsv($handle, array_keys($data));
        $headerDisplayed = true;
    }
    // Put the data into the stream
    fputcsv($handle, $data);
}

// Close the file
fclose($handle);

// Output some stuff for jquery to use
$response = array(
    'result'        => $result,
    'start'         => $start,
    'limit'         => $limit,
    'totalResults'  => $totalResults
);
echo json_encode($response);


// Make sure nothing else is sent, our file is done
exit;
?>

Upvotes: 2

Twisted1919
Twisted1919

Reputation: 2499

My final approach is the second one, after a lot of tests i concluded that in my case the second approach is way better in terms of memory usage, even if the time to complete the entire export is longer, that doesn't matter since the GUI will update with live stats about the export and overall is a good user experience while waiting for the export to finish.

These are the steps i took:
1) Load the page and make first ajax request to server.
2) Server will read first 1000 records in batches of 100 records at a time to avoid getting to many results back at once from mysql.
3) The results are written to a file as part-x.csv, where x is the request number sent by ajax.
4) When there are no more records to add to the file, the last ajax call will create the archive, and delete the folder containing the part-x.csv files. The server then will return a json param called "download" which will contain the url to download the file via PHP(fopen + fread + flush + fclose, followed by unlink the archive file)
5) Using the "download" param, the browser will do a window.location.href = json.download and force the file to be downloaded.

I know, it's more work like this, but as i said, the end result seems to be better than just loading all at once in the way i did first time.

Upvotes: 7

Related Questions