Reputation: 41
My expected task : Get huge data (1 GB & More in size) json string , manipulate (do some formatting,parsing json, restructuring the json data ) and write the new formatted json string as response. What is the better way to handle this scenario ?
From some blog I found storing huge data in stream is efficient way to handle huge amount of data, but How to I manipulate stream data ?
Exception/ Error that I face : FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory Aborted (core dumped)
Input JSON :
"docs": [
{
"note_text": ["7657011|20MAR08|somehugedata|||7657012|20MAR09|somehugedata"],
id:"c123"
},
{
"note_text": ["7657001|23MAR08|somehugedata|||7657002|21MAR08|somehugedata"],
id:"c124"
}
]
new formatted JSON:
"docs": [
{
id:"c123",
note_text : "somehugedata",
date : "20MAR08"
note_id : "7657011"
},
{
id:"c123",
note_text : "somehugedata",
date : "20MAR09"
note_id : "7657012"
},
{
id:"c124",
note_text : "somehugedata",
date : "23MAR08"
note_id : "7657001"
},
{
id:"c124",
note_text : "somehugedata",
date : "21MAR08"
note_id : "7657002"
}
]
Upvotes: 0
Views: 1117
Reputation: 19220
Take a look at JSONStream. With it you don't have to load the whole huge JSON blob into memory. You'll process objects in the source JSON one by one, specifying the proper selection pattern for them in the JSONStream.parse()
.
Upvotes: 1