KKK
KKK

Reputation: 1662

Insert 1 million records using Laravel DB Seed

I'm using faker to get dummy data and trying to add 1million records. Somehow I only can reach around 100000 rows only, Following is my code

$no_of_rows = 1000000;

for( $i=1; $i <= $no_of_rows; $i++ ){
        $user_data[] = [
            'status' => 'ACTIVE',
            'username' => $faker->userName,
            'email' => $faker->email,
            'password' => $password,
            'firstname' => $faker->firstName,
            'surname' => $faker->lastName,
            'mobilenumber' => $faker->phoneNumber,
            'confirmed' => (int)$faker->boolean(50),
            'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
            'dob' => $faker->date(),
            'address_line_1' => $faker->address,
            'address_line_2' => '',
            'post_code' => $faker->postcode,
        ];


}

User::insert($user_data);

I'm getting following error message

PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted

I already set ini_set('memory_limit', '1024M');

Any useful thoughts or solutions?

Upvotes: 5

Views: 16974

Answers (5)

Maxim Paladi
Maxim Paladi

Reputation: 121

I tried to insert 16 million records. And I found the best and quick solution.

With Laravel 9 factories and collection chunking, 50.000 rows were inserted ~1 minutes.

With a simple array and array_chunk() I got ~30 seconds for 50.000 rows.

All 16.000.000 rows were inserted within 40 mins :)

use Faker\Generator;
use Illuminate\Container\Container;
        $accounts = User::query()->pluck('id');
        $faker = Container::getInstance()->make(Generator::class);
        $ownProductId = $faker->regexify('[LD0-9]{1}[A-Z0-9]{9}');
        $data = [];

        for($i=0; $i< 320; $i++) {
            for($v=0; $v< 50000; $v++) {
                $data[] = [
                    'createdAt'       => (int)date('U'),
                    'productId'       => $ownProductId,
                    'title'           => $faker->sentence(3),
                    'imageUrl'        => '',
                    'User_id'         => $accounts->random(),
                    'productIds'      => implode(',', $this->generateFakeProductIds($ownProductId)),
                    'parentProductId' => $faker->regexify('[LD0-9]{1}[A-Z0-9]{9}'),
                    'status' => $faker->randomElement(StatusesEnum::cases()),
                    'searchType'      => $faker->randomElement(SearchTypesEnum::cases()),
                    'url'             => Str::random(50),
                    'marketplace'     => $faker->randomElement(array_keys(MarketplacesConfigService::CONFIGS)),
                ];

            }

            $chunks = array_chunk($data, 5000);
            foreach ($chunks as $chunk) {
                TestModel::query()->insert($chunk);
            }
        }

Upvotes: 1

Piyush Prajapati
Piyush Prajapati

Reputation: 767

Hello: Here is very good and very Fast Insert data solution

$no_of_data = 1000000;
$test_data = array();
for ($i = 0; $i < $no_of_data; $i++){
  $test_data[$i]['number'] = "1234567890";
  $test_data[$i]['message'] = "Test Data";
  $test_data[$i]['status'] = "Delivered";
}
$chunk_data = array_chunk($test_data, 1000);
if (isset($chunk_data) && !empty($chunk_data)) {
  foreach ($chunk_data as $chunk_data_val) {
     DB::table('messages')->insert($chunk_data_val);
  }
}

Upvotes: 1

Matija Boban
Matija Boban

Reputation: 476

The core issue of this problem is that the Faker lib instance (usually used to generate data in Laravel) is memory heavy and it doesn't get properly cleared by the garbage collector while used in large loops.

I agree with the chucked processing that @Rob Mkrtchyan added above but since this is Laravel I would suggest a more elegant solution using the Factory facility.

You can create a specific model factory (in Laravel 5.3 this should be placed in database/factories/), for instance:

$factory->define(Tests::class, function (Faker\Generator $faker) {
    return [
        'status' => 'ACTIVE',
        'username' => $faker->userName,
        'email' => $faker->email,
        'password' => bcrypt('secret'),
        'firstname' => $faker->firstName,
        'surname' => $faker->lastName,
        'mobilenumber' => $faker->phoneNumber,
        'confirmed' => (int)$faker->boolean(50),
        'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
        'dob' => $faker->date(),
        'address_line_1' => $faker->address,
        'address_line_2' => '',
        'post_code' => $faker->postcode,
    ];
});

Then it's a simple matter of running the factory within your dB seeder class. Please note that the number 200 indicated the number of seed data entries to create.

factory(Tests::class, 200)
    ->create();

The reasons for using the seed factory is that it allows you much more flexibility in setting variables, etc. For documentation on this, you can consult the Laravel docs on dB seeding

Now, since you are dealing with a large number of records it's trivial to implement a chunked solution which will aid the php garbage collecting. For instance:

for ($i=0; $i < 5000; $i++) {
    factory(Tests::class, 200)
        ->create();
}

I did a quick test and in this configuration, your script memory usage should be around 12 - 15mb (depending on other system factors of course) regardless of the data entries created.

Upvotes: 10

Rob
Rob

Reputation: 127

Hello: Here is good solution

public function run(){
    for($j = 1; $j < 1000; $j++){
        for($i = 0; $i < 1000; $i++){
             $user_data[] = [
                 'status' => 'ACTIVE',
                 'username' => $faker->userName,
                 'email' => $faker->email,
                 'password' => $password,
                 'firstname' => $faker->firstName,
                 'surname' => $faker->lastName,
                 'mobilenumber' => $faker->phoneNumber,
                 'confirmed' => (int)$faker->boolean(50),
                 'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
                 'dob' => $faker->date(),
                 'address_line_1' => $faker->address,
                 'address_line_2' => '',
                 'post_code' => $faker->postcode,
             ];
        }

        User::insert($user_data);
    }
}

This code uses only 1000 lenght arrays in memory... and you can run this without changing any default php settings...

Enjoy,..

Upvotes: 0

Professor Abronsius
Professor Abronsius

Reputation: 33813

The variables set in the foreach loop never get used so if the only intention if the foreach loop was to add a million records you could do away with the foreach and use something like this? This way the array used to populate the db is redeclared on each iteration rather than having more and more entries added.

$no_of_rows = 1000000;

for( $i=0; $i < $no_of_rows; $i++ ){
    $user_data = array(
        'status' => 'ACTIVE',
        'username' => $faker->userName,
        'email' => $faker->email,
        'password' => $password,
        'firstname' => $faker->firstName,
        'surname' => $faker->lastName,
        'mobilenumber' => $faker->phoneNumber,
        'confirmed' => (int)$faker->boolean(50),
        'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
        'dob' => $faker->date(),
        'address_line_1' => $faker->address,
        'address_line_2' => '',
        'post_code' => $faker->postcode,
    );

    User::insert( $user_data );
    $user_data=null;
}

On the basis of your last comment I can see why the use of chunks - no way to know the syntax of the sql before posting answer so perhaps this might be more suitable?

$no_of_rows = 1000000;
$range=range( 1, $no_of_rows );
$chunksize=1000;

foreach( array_chunk( $range, $chunksize ) as $chunk ){
    $user_data = array();/* array is re-initialised each major iteration */
    foreach( $chunk as $i ){
        $user_data[] = array(
            'status' => 'ACTIVE',
            'username' => $faker->userName,
            'email' => $faker->email,
            'password' => $password,
            'firstname' => $faker->firstName,
            'surname' => $faker->lastName,
            'mobilenumber' => $faker->phoneNumber,
            'confirmed' => (int)$faker->boolean(50),
            'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
            'dob' => $faker->date(),
            'address_line_1' => $faker->address,
            'address_line_2' => '',
            'post_code' => $faker->postcode
        );      
    }
    User::insert( $user_data );
}

Upvotes: 5

Related Questions