Reputation: 6070
im trying to decode large json file 222mb file.
i understand i can not use json_decode directly by using file_get_contents() to read whole file and decode whole string, as it would consume alot of memory and would return nothing(this is what its doing so far.)
so i went to try out libraries, The one i tried recently is JSONParser. what it does reads the objects one by one in json array.
but due to lack of documentation there, i want to ask here if anyone has worked with this library.
this is the example test code from github
// initialise the parser object
$parser = new JSONParser();
// sets the callbacks
$parser->setArrayHandlers('arrayStart', 'arrayEnd');
$parser->setObjectHandlers('objStart', 'objEnd');
$parser->setPropertyHandler('property');
$parser->setScalarHandler('scalar');
/*
echo "Parsing top level object document...\n";
// parse the document
$parser->parseDocument(__DIR__ . '/data.json');*/
$parser->initialise();
//echo "Parsing top level array document...\n";
// parse the top level array
$parser->parseDocument(__DIR__ . '/array.json');
how to use a loop and save the object in php variable that we can easily decode to php array for our further use.
this would take some time as it would be doing this one by one for all objects of json array, but question stands how to loop over it using this library, or isn't there such option.
Or are any other better options or libraries for this sorta job?
Upvotes: 16
Views: 21167
Reputation: 11
Not a reply intended for the op but as an alternative method to anyone else looking into this topic...
You CAN use json_decode() on ANY size file with next to no memory use. Yepp, the best of both worlds. I tried several solutions such as jsonmachine and json_decode as they were designed where some methods were fast digesting the entire file at once with a memory crash while others completed but were painfully slow.
My solution is to break apart the json file into smaller sections and process each with json_decode(). I did this by setting the head and the end of the json file to variables (or constants), then concatenating head + body excerpt + end and processing each batch separately, where the body excerpt was 200-400 records but can be anything the system can handle. I am sure some people will have something negative to say about this but in essence it would be the same as manually making many small json files and processing them individually. This method simply does it for you, relatively fast and can handle a file of literally any size.
My sample file had 1,177,437 records (3.8GB) that involved several operations to prepare the data such as many coordinate conversions, string manipulations, sql queries to retrieve additional data to be included and gz_deflate(). It created sql statements that were queried and completed in 37 min with no errors averaging 530 sql records created per second. The table ended up being 5.2 GB when said and done. If you know that the file(s) will be formatted 100% correctly this can be sped up by reading an entire line opposed to 1 character at a time. I opted for 1 character at a time because on occasion I get geojson files with no line breaks and I designed it for maximum compatibility first, speed second.
Tips: I found that preg_match() worked well to extract the head of the file while simply looking for an equal quantity of opening and closing curly brackets within a string indicated a complete record. The end of the file was a simple "\n]\n}\n" that I hard coded because it is common to all files.
Upvotes: 1
Reputation: 778
Another alternative is to use halaxa/json-machine.
Usage in case of iteration over JSON is the same as in case of json_decode
, but it will not hit memory limit no matter how big your file is. No need to implement anything, just your foreach
.
Example:
$users = \JsonMachine\JsonMachine::fromFile('500MB-users.json');
foreach ($users as $id => $user) {
// process $user as usual
}
See github readme for more details.
Upvotes: 17
Reputation: 15131
One alternative here is to use the salsify/jsonstreamingparser
You need to create your own Listener.
$testfile = '/path/to/file.json';
$listener = new MyListener();
$stream = fopen($testfile, 'r');
try {
$parser = new \JsonStreamingParser\Parser($stream, $listener);
$parser->parse();
fclose($stream);
} catch (Exception $e) {
fclose($stream);
throw $e;
}
To make things simply to understand, I"m using this json for example:
JSON Input
{
"objects": [
{
"propertyInt": 1,
"propertyString": "string",
"propertyObject": { "key": "value" }
},
{
"propertyInt": 2,
"propertyString": "string2",
"propertyObject": { "key": "value2" }
}]
}
You need to implement your own listener. In this case, I just want to get the objects inside array.
PHP
class MyListener extends \JsonStreamingParser\Listener\InMemoryListener
{
//control variable that allow us to know if is a child or parent object
protected $level = 0;
protected function startComplexValue($type)
{
//start complex value, increment our level
$this->level++;
parent::startComplexValue($type);
}
protected function endComplexValue()
{
//end complex value, decrement our level
$this->level--;
$obj = array_pop($this->stack);
// If the value stack is now empty, we're done parsing the document, so we can
// move the result into place so that getJson() can return it. Otherwise, we
// associate the value
if (empty($this->stack)) {
$this->result = $obj['value'];
} else {
if($obj['type'] == 'object') {
//insert value to top object, author listener way
$this->insertValue($obj['value']);
//HERE I call the custom function to do what I want
$this->insertObj($obj);
}
}
}
//custom function to do whatever
protected function insertObj($obj)
{
//parent object
if($this->level <= 2) {
echo "<pre>";
var_dump($obj);
echo "</pre>";
}
}
}
Output
array(2) {
["type"]=>
string(6) "object"
["value"]=>
array(3) {
["propertyInt"]=>
int(1)
["propertyString"]=>
string(6) "string"
["propertyObject"]=>
array(1) {
["key"]=>
string(5) "value"
}
}
}
array(2) {
["type"]=>
string(6) "object"
["value"]=>
array(3) {
["propertyInt"]=>
int(2)
["propertyString"]=>
string(7) "string2"
["propertyObject"]=>
array(1) {
["key"]=>
string(6) "value2"
}
}
}
I tested it against a JSON file with 166MB and it works. Maybe you need to adapt the listener to your needs.
Upvotes: 8
Reputation: 43441
You still need to use json_decode
and file_get_contents
to get full JSON (you can't parse partial JSON). Just increase memory limit for PHP to bigger value using ini_set('memory_limit', '500M');
Also you will be processing longer so use set_time_limit(0);
Upvotes: -13