Tomalak
Tomalak

Reputation: 338278

PHP - *fast* serialize/unserialize?

I have a PHP script that builds a binary search tree over a rather large CSV file (5MB+). This is nice and all, but it takes about 3 seconds to read/parse/index the file.

Now I thought I could use serialize() and unserialize() to quicken the process. When the CSV file has not changed in the meantime, there is no point in parsing it again.

To my horror I find that calling serialize() on my index object takes 5 seconds and produces a huge (19MB) text file, whereas unserialize() takes unbearable 27 seconds to read it back. Improvements look a bit different. ;-)

So - is there a faster mechanism to store/restore large object graphs to/from disk in PHP?

(To clarify: I'm looking for something that takes significantly less than the aforementioned 3 seconds to do the de-serialization job.)

Upvotes: 18

Views: 20720

Answers (8)

Asad Hasan
Asad Hasan

Reputation: 311

Try igbinary...did wonders for me:

http://pecl.php.net/package/igbinary

Upvotes: 7

Baris CUHADAR
Baris CUHADAR

Reputation: 59

First you have to change the way your program works. divide CSV file to smaller chunks. This is an IP datastore i assume. .

Convert all IP addresses to integer or long.

So if a query comes you can know which part to look. There are <?php ip2long() /* and */ long2ip(); functions to do this. So 0 to 2^32 convert all IP addresses into 5000K/50K total 100 smaller files. This approach brings you quicker serialization.

Think smart, code tidy ;)

Upvotes: 5

Daniel Beardsley
Daniel Beardsley

Reputation: 20367

What about using something like JSON for a format for storing/loading the data? I have no idea how fast the JSON parser is in PHP, but it's usually a fast operation in most languages and it's a lightweight format.

http://php.net/manual/en/book.json.php

Upvotes: 0

Brent Baisley
Brent Baisley

Reputation: 12721

SQLite comes with PHP, you could use that as your database. Otherwise you could try using sessions, then you don't have to serialize anything, you just saving the raw PHP object.

Upvotes: 0

dave1010
dave1010

Reputation: 15425

var_export should be lots faster as PHP won't have to process the string at all:

// export the process CSV to export.php
$php_array = read_parse_and_index_csv($csv); // takes 3 seconds
$export = var_export($php_array, true);
file_put_contents('export.php', '<?php $php_array = ' . $export . '; ?>');

Then include export.php when you need it:

include 'export.php';

Depending on your web server set up, you may have to chmod export.php to make it executable first.

Upvotes: 15

zaf
zaf

Reputation: 23254

It seems that the answer to your question is no.

Even if you discover a "binary serialization format" option most likely even that would be to slow for what you envisage.

So, what you may have to look into using (as others have mentioned) is a database, memcached, or on online web service.

I'd like to add the following ideas as well:

  • caching of requests/responses
  • your PHP script does not shutdown but becomes a network server to answer queries
  • or, dare I say it, change the data structure and method of query you are currently using

Upvotes: 4

user187291
user187291

Reputation: 53940

i see two options here

string serialization, in the simplest form something like

  write => implode("\x01", (array) $node);
  read  => explode() + $node->payload = $a[0]; $node->value = $a[1] etc

binary serialization with pack()

  write => pack("fnna*", $node->value, $node->le, $node->ri, $node->payload);
  read  => $node = (object) unpack("fvalue/nre/nli/a*payload", $data);

It would be interesting to benchmark both options and compare the results.

Upvotes: 2

selfawaresoup
selfawaresoup

Reputation: 15832

If you want speed, writing to or reading from the file system in less than optimal.

In most cases, a database server will be able to store and retrieve data much more efficiently than a PHP script that is reading/writing files.

Another possibility would be something like Memcached.

Object serialization is not known for its performance but for its ease of use and it's definitely not suited to handle large amounts of data.

Upvotes: 1

Related Questions