DaveBurns
DaveBurns

Reputation: 2096

Best practice to iterate over result set in Postgres/PHP/PDO?

I'm using PHP 5.3.6 with PDO to access Postgres 9.0.4. I've been asked to reduce the memory footprint of a report. The current implementation is simple: execute the query, do a fetchAll() and then iterate with foreach() through the resulting array. This obviously doesn't scale with huge result sets: it can temporarily consume 100MB or more.

I have a new implementation which takes the PDO statement handle and then iterates directly on it using foreach(), i.e. no intermediate array via fetchAll(). (From what I've read, iterating a statement handle with foreach calls fetch() under the covers.) This is just as fast and consumes way less memory: about 28kB. Still, I'm not confident I'm doing it right because, although I've done a ton of Googling, it's tough to find answers to basic questions about this:

I found that it made no memory or speed difference if I did this:

$sth = $dbh->prepare($sql, array( PDO::ATTR_CURSOR => PDO::CURSOR_FWDONLY ) );

Is this because this is the default anyway for the Postgres PDO driver? This would make sense if it is already using cursors internally.

General comments about the approach and other ways to solve this problem are welcome.

Upvotes: 1

Views: 2843

Answers (3)

Guillaume Outters
Guillaume Outters

Reputation: 1609

  • internal cursors:
    PHP has internal code that explicitely deal with cursors (as pointed by Jon's answer),
    however I did not manage to make the driver enter this code (using setAttribute()). Maybe because I focused on CURSOR_FWONLY while they do not work, as shown by JohnSmith's answer.
  • fetch() and network:
    network-wise, fetch() is as efficient as possible, it uses exactly the same flow as a fetchAll() where the server continuously pushes result rows (sometimes limited by the client not reading its socket fast enough). fetch() then works with the (small) network buffer as a C getline() would, buffering only until it finds an "end-of-row" marker to return it as a PHP array (and shifting the buffer until that marker); next call to fetch() will start with the remains of the network buffer (it may even find a full row in them and thus return immediately), and so on.
    … But this is theorical, and memory-wise, there's a real problem, see my dedicated paragraph below.
  • Iterator:
    Iterating over a statement required dedicated handling (while(($row = $stmt->fetch()))) until PHP 8's getIterator(): one could not have a function that interchangeably accepted a PDOStatement or an array to iterate over, they needed to have a specific loop, or wrap into an homemade Iterator.
  • CURSOR_FWDONLY:
    Nothing to add to JohnSmith's answer

fetch() not a real fetch() until PHP 8.4

Historically, fetch() did an internal fetchAll() that waited for the server to have finished sending the whole resulset (and buffering it entirely) before delivering the first row;
and as it was internal to the driver, I'm not even sure it was reported by memory_get_usage(). The only comfort in that was that maybe this was raw data memory, not yet converted to PHP array, so a little bit thrifty.
But it was thus of absolutely no use, and its name gave a false confidence feeling in its lightness.

PHP 8.4 introduced a PDO::ATTR_PREFETCH attribute which, set to 0 (either on the PDO, or the Statement) allows fetch() to work on-demand, row-by-row without storing the results in mem; as it always should have.

There are still side effects (that prevents introspecting while popping the results, for example with getColumnMeta()), but it is fully usable in a dump-to-CSV scenario.

Upvotes: 0

JohnSmith
JohnSmith

Reputation: 472

Apparently PDO::CURSOR_FWDONLY does not use cursors. Black box tests:

(0) Preparations:

$con = new \PDO('dsn');
// you'll get "NO ACTIVE TRANSACTION" otherwise
$con->beginTransaction();

$sql = 'select * from largetable';

(1) Default - takes forever:

$stmt = $con->prepare($sql);
$stmt->execute();
print_r($stmt->fetch());

(2) FWDONLY - takes forever:

$stmt = $con->prepare($sql, array(\PDO::ATTR_CURSOR => \PDO::CURSOR_FWDONLY));
$stmt->execute();
print_r($stmt->fetch());

(3) SCROLLABLE - runs in a flash:

$stmt = $con->prepare($sql, array(\PDO::ATTR_CURSOR => \PDO::CURSOR_SCROLL));
$stmt->execute();
print_r($stmt->fetch());

I turned on PG logging just to be sure and it is indeed so - only SCROLL uses cursors.

So, the only way to make use of cursors is to use SCROLL, at least in PHP 5.4.23.

Upvotes: 2

Jon
Jon

Reputation: 437684

PDO for Postgres does use cursors internally.

Upvotes: 1

Related Questions