Norgul
Norgul

Reputation: 4783

Optimizing database update

I have to update really large set of data, and so far I have this:

protected function updateWidgetsDb($APIWidgets, $date)
{
    echo "Eager loading...";
    $widgets = Widget::where('date', $date)->get();
    echo "Done\n";
    echo "Updating...";
    foreach ($APIWidgets as $APIWidget) {
        $widget = $widgets->where('widget_id', $APIWidget->dimensions[0])->first();

        if ($widget == null)
            continue;

        $widget->update(['revenue' => $APIWidget->metrics[0]->values[0]]);
    }
    echo "Done\n";
}

$APIWidgets is an array which is fetched through external API. I am fetching data only for a specific date so I am loading only data from that date which already exists in the DB.

widget_id field is indexed in DB

EDIT:

I have about 60k of data with about 2k data I am receiving for update. Each record of those 2k has an ID which can already be found in existing 60k data. So after update, the sum of data should still be 60k, not 62k.

Currently, update process takes ~10 minutes.

Upvotes: 1

Views: 60

Answers (1)

shukshin.ivan
shukshin.ivan

Reputation: 11340

I can propose two methods to speed up a mass update task. I tried to reproduce your issue, so I created a set of 60k*7 items at my widgets table with widget_id,date and widget_id indexes.

  1. First let's make getting $widget faster. When I replaced $widgets->where('widget_id', $APIWidget->id) with

    Widget::where('date', $date)
       ->where('widget_id', $APIWidget->id)
    

    the script became ~400x times faster. Looks like 2000 mysql fetching using index widget_id, date is faster, than Laravel collection search through 60000-size collection. My results was 16 seconds for queries and ~6400 seconds for collections.

  2. Then let's make update faster. There is a nice approach to update table with only one query using temporary table. So I created a new entity WidgetUpdate with table widget_updates and fields id, widget_id, revenue_new.

    I've modified your method: first I collect an array of updates, then I make a bulk insert into widget_updates table and after all I perform a single updating query. My machine does it for 2.2 seconds, 5x times faster. The final speedup is 2000x for my case.

    protected function updateWidgetsDb($APIWidgets, $date)
    {
      echo "Updating...";
      $updates = [];
    
      foreach ($APIWidgets as $APIWidget) {
        $widget = Widget::where('date', $date)
            ->where('widget_id', $APIWidget->dimensions[0])
            ->first();
    
        if ($widget == null)
            continue;
    
        $updates[] = ['revenue_new' => $APIWidget->metrics[0]->values[0]];
      }
    
      # inserting and updating
      WidgetUpdate::insert($updates);
      DB::statement('UPDATE widgets, widget_updates ' . 
                'SET widgets.revenue=widget_updates.revenue_new '.
                'WHERE widgets.id = widget_updates.widget_id');
    
      echo "Done\n";
    }
    

Don't forget to cleanup temp table on finish.

Upvotes: 1

Related Questions