Borut
Borut

Reputation: 8138

PHP's natural sort memory issue on huge amount of data

I'm using php's natural sorting algorithm natsort but I have a consideration about memory usage.

This is how it goes. Script pulls data from mysql and put results into an array and than apply natsort over it. But here is the catch. Row's text can be long and there could be hundreds of rows.

Example code:

$array = array();
while ($row = $db->getResults()) {
  $array[$row->code] = $row->text;
}

if (empty($array)) {
  uksort($array, "strnatcmp");
}

I wonder how is this affecting memory? Is this appropriate approach or should I do something more efficient, more memory pleasant?

Upvotes: 1

Views: 659

Answers (2)

goat
goat

Reputation: 31824

One thing you can do is store a new column which duplicates the column you want to sort, but stores it in a transformed format that will sort naturally when using the regular sort algorithm.

Conceptually, you can do this by left-padding digit sequences with zeros to a length that will be as long as the longest possible numeric sequence that could occur in your string.

My solution isn't totally rugged, but if your strings just have digit sequences of known maximum lengths, then you can left pad them with zeros to that known max length. For example, if you had cd track titles with the track number embedded into the title like:

1 Foo
2 Bar
...
10 Baz

Maybe you decide that the longest numeric sequence possible would be 3(999 possible tracks), so you would pad the numeric sequences like

001 Foo
002 Bar
...
010 Baz

This works fine with strings that have multiple numeric sequences.

Sample php code, although you could write a mysql stored function to do this, and then use insert and update triggers on the table so that it's maintained transparently.

$input = 'a44b1c399d4';
$nat = preg_replace_callback('#\d+#', function($m) {
    return str_pad($m[0], 3, '0', STR_PAD_LEFT);
}, $input);
echo $nat; // a044b001c399d004

Then just sort in mysql via

order by natsort_column

This also lets you put an index on that column, giving you good sort performance.

Upvotes: 2

Xeoncross
Xeoncross

Reputation: 57254

You need to use the MySQL WHERE, GROUP BY, and ORDER BY clauses so you don't waste time at the PHP level parsing thousands of unneeded records.

Upvotes: -1

Related Questions