James Nine
James Nine

Reputation: 2618

PHP: How to extract JSON strings out of a string dump

I have a huge string dump that contains a mix of regular text and JSON. I want to seperate/remove the JSON objects from the string dump and get the text only.

Here is an example:

This is some text {'JSON':'Object'} Here's some more text {'JSON':'Object'} Yet more text {'JSON':'Object'} Again, some text.

My goal is to get a text dump that looks like this (basically the JSON is removed):

This is some text Here's some more text Yet more text Again, some text.

I need to do this all in PHP. The text dump is always random, and so is the JSON data structure (most of the it is deeply nested). The dump may or may not start with JSON, and it may or may not contain more than one JSON object within the string dump.

I have tried using json_decode on the string but the result ends up as NULL

EDIT: Amal's answer is really close to what I want (see the 2nd comment below):

$str = preg_replace('#\{.*?\}#s', '', $str);

However, it doesn't get rid of nested objects at all; e.g. data contained in brackets: [] or [{}]

Sorry, I'm not an expert in regex.

I realized that some of you may need a more concrete example of the string dump I'm dealing with; therefore I've created a gist (please note that this is not static data; the data in the dump will always be different; my example above just simplifies the string I'm working with): https://gist.github.com/anonymous/6855800

Upvotes: 5

Views: 10294

Answers (3)

Brian F
Brian F

Reputation: 11

Here is a working code snippet that works based on animesh seth's answer.

if (strpos($msg, '{') !== false) {
    $msg = str_split($msg);
    // extract the json message.
    $json = '';
    $in = 0;
    foreach ($msg as $i => $char) {
        if ($char == '{') {
            $in++;
        }
        if ($in) {
            $json .= $msg[$i];
        }
        if ($char == '}') {
            $in--;
        }
    }
    if ($json) {
        $json = json_decode($json);
    }
    // do something with the json object.
}

Upvotes: 1

Jerry
Jerry

Reputation: 71568

I wanted you to post the code you used in your attempt using JSON_decode but oh well...

You can use a recursive regex for nested braces in PHP:

$res = preg_replace('~\{(?:[^{}]|(?R))*\}~', '', $text);

regex101 demo (The part highlighted in blue will be removed).

Upvotes: 16

animesh seth
animesh seth

Reputation: 29

take a stack and start iterating over the string from the begining.

for($i=0;i<count($str);$i++){
}

whenver you find $str[i] == '{' push this element into the stack and initialize the start variable to $i:

$start = $i;

now whenver a { or [ occurs in th string start push into the stack. if ] or } occurs and the top of the stack is not { or ] that means this is not a correct json. if not so then pop the top of stack and keep on doing so until stack is empty.

at that point you get $end = $i;

this will be one of the json string. (from $start to $end) push this string into another array which keeps all the jsons.

and keep on processing till you reach the end

Upvotes: 1

Related Questions