user1160230
user1160230

Reputation:

Big Json data to mysql

Im trying to import json data to mysql.My json file nearly 3.7Mb and have nearly 17k rows (test data real data will be nearly 65k rows ).But with my script its very slow and it takes nearly 8-9min.Is there any fast way to import json data to mysql with php progress interface ? And im trying to add progress bar feature and its works for now.

$veri=json_decode(file_get_contents('auctions.json'));

            $sayi=count($veri->alliance->auctions);
            $a=$veri->alliance->auctions;
            $yuzde=round($sayi/100);
            echo "<div id='tasiyici'>";
            $sql=$db->prepare("INSERT INTO auctions (id, auc, item, owner, bid, buyout, quantity, timeLeft) VALUES ('',?,?,?,?,?,?,?)");
            for ($i=0;$i<=$sayi;$i++){
                $sql->execute(array($a[$i]->auc,$a[$i]->item,$a[$i]->owner,$a[$i]->bid,$a[$i]->buyout,$a[$i]->quantity,$a[$i]->timeLeft));
                if($i%$yuzde=='0'){
                    $y=$i/$yuzde;

                    if(($y*4+4)>"180"){$pos=40-(($y*4+4)-180); $color="color:#fff";}
                    if(($y*4+4)>=220){$pos=0;}
                    echo "<div class='rakam' style='background-position:-".$pos."px 0;".$color."'>%".($y+1)."</div>";
                    echo "<div class='yuzde' style='width:".($y*4+4)."px;'></div>";
                    ob_flush();
                    flush();

                }


            }
            echo "</div>";

            echo "<br> $sayi data added.";

CSS Codes

<style>
        body {
            font-family:Arial;
        }
            #tasiyici {
                width:400px;
                height:17px;
                display: block;
                position: relative;
                margin:50px auto;
                background:#e3e3e3;
                border-radius:5px;
                overflow: hidden;
                border:1px solid #ccc;
            }
            .yuzde {
                height:17px;
                display: block;
                width:1px;
                background:url("progressOverlay.png");
                position: absolute;
                top:0;
                left:0;
                z-index:1;
            }
            .rakam {
                width:40px;
                height:16px;
                display: block;
                line-height:17px;
                position: absolute;
                left:50%;
                top:0;
                margin-left:-20px;
                z-index:9999;
                background:url("progressOverlay.png") -40px 0 #e3e3e3 no-repeat;
                font-size:11px;
            }

        </style>

Upvotes: 0

Views: 3444

Answers (3)

T9b
T9b

Reputation: 3502

I had a similar problem to solve, but I did not know either the number of records from an external API. I realised that there was a risk that my job could just fall over mid process and I didn't want to have to start again.

It strikes me that your issue is similar in that respect. What you are asking your database to do is try to insert a record perhaps even if the record exists already. That's costly.

What I did was build a control table that stored the key of the data or some unique identifyer. During processing the job reads the control database to determine if the record exists, if it does, just skip to the next record else insert it.

I created a record number limiter that I could test the load with (say limiting to the first 100 records) and with this tool I was able to run the php script in 5 open tabs in my browser to effectively create parallel loading.

Upvotes: 1

Luc Franken
Luc Franken

Reputation: 3014

9 minutes * 60 seconds = 540 seconds

17000 / 540 = ~ 30.5 records per second when inserting

Because you did not post your server configuration, load, ram etc. we cannot directly put the issue at a certain point unfortunately. Anyhow 30.5 inserts / sec is not much for a serious server. It is very doable certainly because your row size doesn't seem big.

What you need to do is do some serious measurements. For example lack of memory size will create issues. Also lots of expensive indexes on the table will slow down insertion quite hard.

If you do this lots of times (for example by user upload) it might be wise to create a queue for it since it will anyway take some time. Though 17k inserts should be doable in seconds not minutes.

There is a lot of documentation available about optimizing MySQL for inserts, this is a general overview of the influences at the speed: http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

At first sight it doesn't seem to be your script, that's not really special. Though I would seperate the process in 2 scripts:

  1. Handle the insertion in a process running in the background, like a cron, a constant running script etc. So upload the data, store it and the background process will handle that. Maybe create a table importjobs for it.

Every x records you could do something like this:

UPDATE importjobs SET counter = :amountimported WHERE id=:jobid
  1. A script giving you answer to the status. By getting the status messages written by the background script.

    SELECT counter/total AS partdone, counter, total FROM importjobs WHERE id=:jobid

That way you can measure and improve the process totally seperate from the user interface. Seperation of concerns happens. You give full speed to the import and you can seperate the update indicator totally from the process.

Depending on the speed you get you can decide whether you want to update every 5, 10, 20, 60 seconds or whatever. This lookup is quite cheap so you can do it quite some times. Mostly because that importjobs table is also very small.

Upvotes: 1

RumpRanger
RumpRanger

Reputation: 654

If the extended inserts are still not fast enough try using load data infile. It won't give you progress but it will load your data the fastest that I'm aware of.

edit Here's a link to the mysql man page http://dev.mysql.com/doc/refman/5.1/en/load-data.html

Upvotes: 0

Related Questions