Reputation: 1662
I'm using faker to get dummy data and trying to add 1million records. Somehow I only can reach around 100000 rows only, Following is my code
$no_of_rows = 1000000;
for( $i=1; $i <= $no_of_rows; $i++ ){
$user_data[] = [
'status' => 'ACTIVE',
'username' => $faker->userName,
'email' => $faker->email,
'password' => $password,
'firstname' => $faker->firstName,
'surname' => $faker->lastName,
'mobilenumber' => $faker->phoneNumber,
'confirmed' => (int)$faker->boolean(50),
'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
'dob' => $faker->date(),
'address_line_1' => $faker->address,
'address_line_2' => '',
'post_code' => $faker->postcode,
];
}
User::insert($user_data);
I'm getting following error message
PHP Fatal error: Allowed memory size of 1073741824 bytes exhausted
I already set ini_set('memory_limit', '1024M');
Any useful thoughts or solutions?
Upvotes: 5
Views: 16974
Reputation: 121
I tried to insert 16 million records. And I found the best and quick solution.
With Laravel 9 factories and collection chunking, 50.000 rows were inserted ~1 minutes.
With a simple array and array_chunk()
I got ~30 seconds for 50.000 rows.
All 16.000.000 rows were inserted within 40 mins :)
use Faker\Generator;
use Illuminate\Container\Container;
$accounts = User::query()->pluck('id');
$faker = Container::getInstance()->make(Generator::class);
$ownProductId = $faker->regexify('[LD0-9]{1}[A-Z0-9]{9}');
$data = [];
for($i=0; $i< 320; $i++) {
for($v=0; $v< 50000; $v++) {
$data[] = [
'createdAt' => (int)date('U'),
'productId' => $ownProductId,
'title' => $faker->sentence(3),
'imageUrl' => '',
'User_id' => $accounts->random(),
'productIds' => implode(',', $this->generateFakeProductIds($ownProductId)),
'parentProductId' => $faker->regexify('[LD0-9]{1}[A-Z0-9]{9}'),
'status' => $faker->randomElement(StatusesEnum::cases()),
'searchType' => $faker->randomElement(SearchTypesEnum::cases()),
'url' => Str::random(50),
'marketplace' => $faker->randomElement(array_keys(MarketplacesConfigService::CONFIGS)),
];
}
$chunks = array_chunk($data, 5000);
foreach ($chunks as $chunk) {
TestModel::query()->insert($chunk);
}
}
Upvotes: 1
Reputation: 767
Hello: Here is very good and very Fast Insert data solution
$no_of_data = 1000000;
$test_data = array();
for ($i = 0; $i < $no_of_data; $i++){
$test_data[$i]['number'] = "1234567890";
$test_data[$i]['message'] = "Test Data";
$test_data[$i]['status'] = "Delivered";
}
$chunk_data = array_chunk($test_data, 1000);
if (isset($chunk_data) && !empty($chunk_data)) {
foreach ($chunk_data as $chunk_data_val) {
DB::table('messages')->insert($chunk_data_val);
}
}
Upvotes: 1
Reputation: 476
The core issue of this problem is that the Faker lib instance (usually used to generate data in Laravel) is memory heavy and it doesn't get properly cleared by the garbage collector while used in large loops.
I agree with the chucked processing that @Rob Mkrtchyan added above but since this is Laravel I would suggest a more elegant solution using the Factory facility.
You can create a specific model factory (in Laravel 5.3 this should be placed in database/factories/), for instance:
$factory->define(Tests::class, function (Faker\Generator $faker) {
return [
'status' => 'ACTIVE',
'username' => $faker->userName,
'email' => $faker->email,
'password' => bcrypt('secret'),
'firstname' => $faker->firstName,
'surname' => $faker->lastName,
'mobilenumber' => $faker->phoneNumber,
'confirmed' => (int)$faker->boolean(50),
'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
'dob' => $faker->date(),
'address_line_1' => $faker->address,
'address_line_2' => '',
'post_code' => $faker->postcode,
];
});
Then it's a simple matter of running the factory within your dB seeder class. Please note that the number 200 indicated the number of seed data entries to create.
factory(Tests::class, 200)
->create();
The reasons for using the seed factory is that it allows you much more flexibility in setting variables, etc. For documentation on this, you can consult the Laravel docs on dB seeding
Now, since you are dealing with a large number of records it's trivial to implement a chunked solution which will aid the php garbage collecting. For instance:
for ($i=0; $i < 5000; $i++) {
factory(Tests::class, 200)
->create();
}
I did a quick test and in this configuration, your script memory usage should be around 12 - 15mb (depending on other system factors of course) regardless of the data entries created.
Upvotes: 10
Reputation: 127
Hello: Here is good solution
public function run(){
for($j = 1; $j < 1000; $j++){
for($i = 0; $i < 1000; $i++){
$user_data[] = [
'status' => 'ACTIVE',
'username' => $faker->userName,
'email' => $faker->email,
'password' => $password,
'firstname' => $faker->firstName,
'surname' => $faker->lastName,
'mobilenumber' => $faker->phoneNumber,
'confirmed' => (int)$faker->boolean(50),
'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
'dob' => $faker->date(),
'address_line_1' => $faker->address,
'address_line_2' => '',
'post_code' => $faker->postcode,
];
}
User::insert($user_data);
}
}
This code uses only 1000 lenght arrays in memory... and you can run this without changing any default php settings...
Enjoy,..
Upvotes: 0
Reputation: 33813
The variables set in the foreach
loop never get used so if the only intention if the foreach loop was to add a million records you could do away with the foreach and use something like this? This way the array used to populate the db is redeclared on each iteration rather than having more and more entries added.
$no_of_rows = 1000000;
for( $i=0; $i < $no_of_rows; $i++ ){
$user_data = array(
'status' => 'ACTIVE',
'username' => $faker->userName,
'email' => $faker->email,
'password' => $password,
'firstname' => $faker->firstName,
'surname' => $faker->lastName,
'mobilenumber' => $faker->phoneNumber,
'confirmed' => (int)$faker->boolean(50),
'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
'dob' => $faker->date(),
'address_line_1' => $faker->address,
'address_line_2' => '',
'post_code' => $faker->postcode,
);
User::insert( $user_data );
$user_data=null;
}
On the basis of your last comment I can see why the use of chunks - no way to know the syntax of the sql before posting answer so perhaps this might be more suitable?
$no_of_rows = 1000000;
$range=range( 1, $no_of_rows );
$chunksize=1000;
foreach( array_chunk( $range, $chunksize ) as $chunk ){
$user_data = array();/* array is re-initialised each major iteration */
foreach( $chunk as $i ){
$user_data[] = array(
'status' => 'ACTIVE',
'username' => $faker->userName,
'email' => $faker->email,
'password' => $password,
'firstname' => $faker->firstName,
'surname' => $faker->lastName,
'mobilenumber' => $faker->phoneNumber,
'confirmed' => (int)$faker->boolean(50),
'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
'dob' => $faker->date(),
'address_line_1' => $faker->address,
'address_line_2' => '',
'post_code' => $faker->postcode
);
}
User::insert( $user_data );
}
Upvotes: 5