Edward
Edward

Reputation: 9778

How to fix badly formatted JSON in PHP?

I'm getting a data feed which is in JSON format and the only available format. In PHP, I'm using json_decode to decode the JSON, but it was breaking, and I found out that the JSON was generated in some places with double quotes in a person's nick name. I verified this using: http://jsonformatter.curiousconcept.com

I don't have control over the creation of the data, but I have to deal with this broken format when it occurs. This data after it's parsed will be put into a MySQL TABLE.

For example:

"contact1": "David "Dave" Letterman",

json_decode would return a NULL. If I manually saved the file, and changed it to single quotes around the nickname of Dave, then everything worked.

$json_string = file_get_contents($json_download);
$json_array = json_decode($json_string, true);

How do I fix the broken JSON format in json_string before it gets processed by json_decode? What should be done to pre-process the file, backslash the double quotes of the nickname? Or change them to single quotes? Is it even a good idea to store double quotes like this in MySQL?

I don't know when this might occur with each data feed, so I don't want to just check for contact1 if it has inner double quotes to fix them. Is there a way in PHP to take a line such as the above example, and backslash everything after the colon except the outer double quotes? Thanks!

This is the correct code for as provided by tftd:

<?php
// This:
// "contact1": "David "Dave" Letterman",
// Needs to look like this to be decoded by JSON:
// "contact1": "David \"Dave\" Letterman",

$data ='"contact1": "David "Dave" Letterman",';
function replace($match){
    $key = trim($match[1]);
    $val = trim($match[2]);

    if($val[0] == '"')
        $val = '"'.addslashes(substr($val, 1, -1)).'"';
    else if($val[0] == "'")
        $val = "'".addslashes(substr($val, 1, -1))."'";

    return $key.": ".$val;
}
$preg = preg_replace_callback("#([^{:]*):([^,}]*)#i",'replace',$data);
var_dump($preg);
$json_array = json_decode($preg);
var_dump($json_array);
echo $json_array . "\n";
echo $preg . "\n";
?>

Here is the output:

string(39) ""contact1": "David \"Dave\" Letterman","
NULL

"contact1": "David \"Dave\" Letterman",

Upvotes: 7

Views: 14166

Answers (6)

giorgia beavers
giorgia beavers

Reputation: 1

function fixJson($json) {
  $json = preg_replace('/,\s+/', ',', $json);
  $json = preg_replace('/}+\s*{+/', '},{', $json);
  $json = preg_replace('/,\s*]/', ']', $json);
  $json = preg_replace('/,\s*}/', '}', $json);
  $json = trim($json);

  return $json;
}

Upvotes: 0

Olivier Lutzwiller
Olivier Lutzwiller

Reputation: 1

Regexp are not reliable when comma and [] are in the values is contains json strings, worries and nightmare start. In php json_decode fails without quotes on key, one suggest to use pear Services_JSON which achieves the safiest results if code fixed for class names and the game of invalid json is over:

<?php include("Services_JSON-1.0.3b/JSON.php"); 
//Patched version https://github.com/pear/Services_JSON/edit/trunk/JSON.php

$json = <<< JSONCODEFROMJS
   { 
      sos:presents, 
      james:'bond', 
      "agent":[0,0,7], 
      secret:"{mission:'impossible',permit: \"tokill\"}",
      go:true 
    }
JSONCODEFROMJS;

function json_fix($json) {
   $sjson = new Services_JSON(SERVICES_JSON_IN_ARR|SERVICES_JSON_USE_TO_JSON| SERVICES_JSON_LOOSE_TYPE);
   $json_array=$sjson->decode($json);
   return json_encode($json_array);
}

$json_array = json_decode(json_fix($json),true);

if(json_last_error() == JSON_ERROR_NONE) {

   $json=json_encode($json_array,JSON_PRETTY_PRINT);
   echo "<pre>";
   echo(htmlentities($json));
   echo "</pre>";
} else {
   die(json_last_error_msg());
}
?>

Upvotes: 0

I have a own jsonFixer() function - it works in two steps: removing garbage (for equality of incoherent formatting) and reformatting.

<?php
  function jsonFixer($json){
    $patterns     = [];
    /** garbage removal */
    $patterns[0]  = "/([\s:,\{}\[\]])\s*'([^:,\{}\[\]]*)'\s*([\s:,\{}\[\]])/"; //Find any character except colons, commas, curly and square brackets surrounded or not by spaces preceded and followed by spaces, colons, commas, curly or square brackets...
    $patterns[1]  = '/([^\s:,\{}\[\]]*)\{([^\s:,\{}\[\]]*)/'; //Find any left curly brackets surrounded or not by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[2]  =  "/([^\s:,\{}\[\]]+)}/"; //Find any right curly brackets preceded by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[3]  = "/(}),\s*/"; //JSON.parse() doesn't allow trailing commas
    /** reformatting */
    $patterns[4]  = '/([^\s:,\{}\[\]]+\s*)*[^\s:,\{}\[\]]+/'; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets followed by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[5]  = '/["\']+([^"\':,\{}\[\]]*)["\']+/'; //Find one or more of quotation marks or/and apostrophes surrounding any character except colons, commas, curly and square brackets...
    $patterns[6]  = '/(")([^\s:,\{}\[\]]+)(")(\s+([^\s:,\{}\[\]]+))/'; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets surrounded by quotation marks followed by one or more spaces and  one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[7]  = "/(')([^\s:,\{}\[\]]+)(')(\s+([^\s:,\{}\[\]]+))/"; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets surrounded by apostrophes followed by one or more spaces and  one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[8]  = '/(})(")/'; //Find any right curly brackets followed by quotation marks...
    $patterns[9]  = '/,\s+(})/'; //Find any comma followed by one or more spaces and a right curly bracket...
    $patterns[10] = '/\s+/'; //Find one or more spaces...
    $patterns[11] = '/^\s+/'; //Find one or more spaces at start of string...

    $replacements     = [];
    /** garbage removal */
    $replacements[0]  = '$1 "$2" $3'; //...and put quotation marks surrounded by spaces between them;
    $replacements[1]  = '$1 { $2'; //...and put spaces between them;
    $replacements[2]  = '$1 }'; //...and put a space between them;
    $replacements[3]  = '$1'; //...so, remove trailing commas of any right curly brackets;
    /** reformatting */
    $replacements[4]  = '"$0"'; //...and put quotation marks surrounding them;
    $replacements[5]  = '"$1"'; //...and replace by single quotation marks;
    $replacements[6]  = '\\$1$2\\$3$4'; //...and add back slashes to its quotation marks;
    $replacements[7]  = '\\$1$2\\$3$4'; //...and add back slashes to its apostrophes;
    $replacements[8]  = '$1, $2'; //...and put a comma followed by a space character between them;
    $replacements[9]  = ' $1'; //...and replace by a space followed by a right curly bracket;
    $replacements[10] = ' '; //...and replace by one space;
    $replacements[11] = ''; //...and remove it.

    $result = preg_replace($patterns, $replacements, $json);

    return $result;
  }
?>

Example of usage:

<?php
  // Received badly formatted json:
  // {"contact1": "David "Dave" Letterman", price : 30.00, 'details' : "Greatest 'Hits' Album"}
  $json_string = '{"contact1": "David "Dave" Letterman", price : 30.00, \'details\' : "Greatest \'Hits\' Album"}';
  jsonFixer($json_string);
?>

Will result:

{"contact1": "David \"Dave\" Letterman", "price" : "30.00", "details" : "Greatest \'Hits\' Album"}

Note: this wasn't tested with all possible badly formatted JSON strings but I use on a complex multi level JSON string and is working well until then.

Upvotes: 13

tftd
tftd

Reputation: 17032

As others have already pointed out, it's best if you tell your client for the problem with the JSON formatting. Ask them to send a bugreport to the original developer/company so they could fix it. If he/they can't fix it - then offer your solution. You simply need to addslashes the string before you json_encode it.

If for some reason you end up having to fix the formatting, here is a way that might work for you:

$data = '"contact1": "David "Dave" Letterman", "contact2": "Peter "Robert" Smith",{\'test\': \'working "something"\'}';
function replace($match){
    $key = trim($match[1]);
    $val = trim($match[2]);

    if($val[0] == '"')
        $val = '"'.addslashes(substr($val, 1, -1)).'"';
    else if($val[0] == "'")
        $val = "'".addslashes(substr($val, 1, -1))."'";

    return $key.": ".$val;
}
$preg = preg_replace_callback("#([^{:]*):([^,}]*)#i",'replace',$data);
var_dump($preg);
// string '"contact1": "David \"Dave\" Letterman", "contact2": "Peter \"Robert\" Smith",{'test': 'working \"something\"'}' (length=110)

Keep in mind this may break if somebody messes with the json format again.

Upvotes: 4

Anthony Hatzopoulos
Anthony Hatzopoulos

Reputation: 10537

Tell them to escape their strings before output. You can even offer to fix it or provide the code solution.

Otherwise you can use preg_replace with a regex expression

See Replacing specified double quotes in text with preg_replace

Upvotes: 0

Patrick M
Patrick M

Reputation: 1086

As other people have said, you can do a search and replace, but the hard part is going to be creating your fuzzy matching rules, because in order to parse it, you will need to assume some things. Probably, you will need to assume either:

1a) Keys don't contain colons
1b) or key quotes are properly escaped
and
2a) Values don't contain commas
2b) or values have properly escaped quotes.

Even then, you might get into situations where your parsing gets confused, and it gets worse if they have comments their JSON. (Not conforming, but very common.)

Now, depending on the data, you can use newlines to decide when you're looking at a new key, but again, that's not reliable and you start making big assumptions.

So, long story short you either have to make some assumptions that might be made wrong at any time, or you need to get them to fix the data.

Upvotes: 0

Related Questions