Chris Maness
Chris Maness

Reputation: 1688

Using PHP to dump large databases into JSON

I have a slight problem with an application I am working on. The application is used as a developer tool to dump tables from a database in a MySQL server to a JSON file which the devs grab by using the Unix curl command. So far the databases we've been using are relatively small tables(2GB or less) however recently we've moved into another stage of testing that use fully populated tables (40GB+) and my simple PHP script breaks. Here's my script:

[<?php 

$database = $_GET['db'];

ini_set('display_errors', 'On');
error_reporting(E_ALL);

# Connect
mysql_connect('localhost', 'root', 'root') or die('Could not connect: ' . mysql_error());

# Choose a database
mysql_select_db('user_recording') or die('Could not select database');

# Perform database query
$query = "SELECT * from `".$database."`";
$result = mysql_query($query) or die('Query failed: ' . mysql_error());

while ($row = mysql_fetch_object($result)) {
   echo json_encode($row);
   echo ",";
}

?>] 

My question to you is what can I do to make this script better about handling larger database dumps.

Upvotes: 4

Views: 1467

Answers (3)

Mahdi
Mahdi

Reputation: 9407

I'd say just let mysql do it for you, not php:

SELECT 
 CONCAT("[",
      GROUP_CONCAT(
           CONCAT("{field_a:'",field_a,"'"),
           CONCAT(",field_b:'",field_b),"'}")
      )
 ,"]") 
AS json FROM table;

it should generates something like this:

[
    {field_a:'aaa',field_b:'bbb'},
    {field_a:'AAA',field_b:'BBB'}
]

Upvotes: 1

Matt
Matt

Reputation: 556

You might have a problem with MySQL buffering. But, you might also have other problems. If your script is timing out, try disabling the timeout with set_time_limit(0). That's a simple fix, so if that doesn't work, you could also try:

  1. Try dumping your database offline, then transfer it via script or just direct http. You might try making a first PHP script call a shell script which calls a PHP-CLI script that dumps your database to text. Then, just pull the database via HTTP.
  2. Try having your script dump part of a database (the rows 0 through N, N+1 through 2N, etc).
  3. Are you using compression on your http connections? If your lag is transfer time (not script processing time), then speeding up the transfer via compression might help. If it's the data transfer, JSON might not be the best way to transfer the data. Maybe it is. I don't know. This question might help you: Preferred method to store PHP arrays (json_encode vs serialize)

Also, for options 1 and 3, you might try looking at this question:

What is the best way to handle this: large download via PHP + slow connection from client = script timeout before file is completely downloaded

Upvotes: 0

Aleksandar Vucetic
Aleksandar Vucetic

Reputation: 14953

This is what I think that the problem is:

you are using mysql_query. mysql_query buffers data in memory and then mysql_fetch_object just fetches that data from the memory. For very large tables, you just don't have enough memory (most likely you are getting all 40G of rows into that one single call).

Use mysql_unbuffered_query instead. More info here on MySQL performance blog There you can find some other possible causes for this behavior.

Upvotes: 3

Related Questions