Jared Eitnier
Jared Eitnier

Reputation: 7152

php & mysql: most efficient method to check large array against database

I have a large set of data stored in a multi-dimensional array. An example structure is as below:

Array
(
    [1] => Array
        (
            [0] => motomummy.com
            [1] => 1921
            [2] => 473
        )
    [4] => Array
        (
            [0] => kneedraggers.com
            [1] => 3051
            [2] => 5067
        )
)

I also have a table in a mysql database that currently contains ~80K domain names. This list will grow monthly by possibly ~10K+ domain names. The goal is to compare Array[][0] (the domain name) against the mysql database and return an array with preserved values (but key preservation is not important) that only contains unique values.

Please note, that I only want to compare the first index alone, NOT the entire array.

The initial multi-dimensional array is assumed to be enormous in size (more than likely anywhere from 100k to 10 million results). What is the best way to get data back that is not contained in the database?

What I am doing now is simply storing to an array, the complete list of domains from the database, then using the following function, comparing each value in the initial array against the database array. This is horribly slow and inefficient obviously.

// get result of custom comparison function
$clean = array_filter($INITIAL_LIST, function($elem) {
$wordOkay = true;

// check every word in "filter from database" list, store it only if not in list           
    foreach ($this->domains as $domain) {
        if (stripos($elem[0], $domain) !== false) {
            $wordOkay = false;
            break;
        }
    }

    return $wordOkay;
});

Some pseudo code or even actual code would be very helpful at this point.

Upvotes: 4

Views: 837

Answers (1)

dualed
dualed

Reputation: 10502

Use the DBMS! It was made for stuff like that.

  • Create a temporary table temp { id (fill with array index); url (filled with url)}

  • Fill it with your array's data

  • Ideally create an index on temp.url

  • Query the database:

    SELECT * FROM `temp` LEFT JOIN `urls`
    WHERE urls.url = temp.url AND urls.url IS NULL;
    

    (the table urls is your existing data)

Upvotes: 2

Related Questions