Reputation: 792
I have this routine that fetches some data from a webservice and stores it in my database. This data have 20k+ items. To save them into the database, i have to retrieve some information first and then store them. So i have this foreach loop that runs 20k+ times, performing a read and a write to the database each time.
But this approach slows down over time. It takes more than an hour to finish!
I've disabled the query log (DB::disableQueryLog()
) but i didn't notice any gain in performance.
Here's my code:
$data = API::getItems();
foreach ($data as $item) {
$otherItem = OtherItem::where('something', $item['something'])->first();
if (!is_null($otherItem)) {
Item::create([
...
]);
}
}
As a solution i decided to pre-fetch all the OtherItem
into a collection and it solved the problem:
$data = API::getItems();
$otherItems = OtherItem::all();
foreach ($data as $item) {
$otherItem = otherItems->where('something', $item['something'])->first();
if (!is_null($otherItem)) {
Item::create([
...
]);
}
}
But i want to understand why the first approach slows down drastically over time and what is the best way to do such sort of things.
EDIT:
To clarify: I know that doing 20k queries is not performant and, in this case, performance is not important (unless it takes hours instead of minutes). I will only run this routine while in development now and then. My final approach was a mix of both answers (I haven't thought in buffering the items and insert them in batches). Here's the code for anyone interested:
$data = collect(API::getPrices());
$chunks = $data->chunk(500);
$otherItems = OtherItem::all();
foreach ($chunks as $items) {
$buffer = [];
foreach ($items as $item) {
$otherItem = otherItems->where('something', $item['something'])->first();
if (!is_null($otherItem)) {
$buffer[] = [
...
];
}
}
Item::insert($buffer);
}
So, what is bothering me is why is painfully slow (even with all the queries). I've decided to do some benchmarking to analyse the question further. With the two queries approach i get the following results:
For 6000 loop:
Mean read: 0.3196 s
Max write: 0.9133 s
Every 10-20 iteractions the read time goes up to over a sec for 2-3 iteractions which is weird and i have no ideia why.
Just out of curiosity, i've also benchmarked the diference between chunking and buffering the items before inserting into the DB:
Upvotes: 2
Views: 318
Reputation: 163768
In first code snippet, you're creating 40000 queries for 20000 items. It's two queries per item - first will get the data, second will store something.
Second code snippet will create 20001 query and it's very slow solution too.
You can build an array and use insert()
instead of using create()
method each time you want to store some data. So this code will create just 2 queries instead of 40000 and 20001.
$otherItems = OtherItem::all();
$items = [];
foreach ($data as $item) {
$otherItem = otherItems->where('something', $item['something'])->first();
if (!is_null($model)) {
$items[] = [.....];
}
}
Item::insert($items);
Upvotes: 2
Reputation: 26765
It slows down because there's simply so many queries - each one is a round trip to the database.
Another thing you can do is try chunking the inserts with database transactions. Play around with the exact numbers but try inserting in batches of a few hundred or so.
i.e.
Laravel's ORM provides a chunk method for this kind of use case.
Upvotes: 0