Behrang Saeedzadeh
Behrang Saeedzadeh

Reputation: 47913

How should I escape strings in JSON?

When creating JSON data manually, how should I escape string fields? Should I use something like Apache Commons Lang's StringEscapeUtilities.escapeHtml, StringEscapeUtilities.escapeXml, or should I use java.net.URLEncoder?

The problem is that when I use SEU.escapeHtml, it doesn't escape quotes and when I wrap the whole string in a pair of 's, a malformed JSON will be generated.

Upvotes: 183

Views: 535772

Answers (20)

dutoitns
dutoitns

Reputation: 2253

Please note:

This answer has two aspects that I think needs to be addressed. Please look for the second aspect to this answer towards the bottom.


There is now a StringEscapeUtils#escapeJson(String) method in the Apache Commons Text library.

The methods of interest are as follows:

This functionality was initially released as part of Apache Commons Lang version 3.2 but has since been deprecated and moved to Apache Commons Text. So if the method is marked as deprecated in your IDE, you're importing the implementation from the wrong library (both libraries use the same class name: StringEscapeUtils).

The implementation isn't pure Json. As per the Javadoc:

Escapes the characters in a String using Json String rules.

Escapes any values it finds into their Json String form. Deals correctly with quotes and control-chars (tab, backslash, cr, ff, etc.)

So a tab becomes the characters '\' and 't'.

The only difference between Java strings and Json strings is that in Json, forward-slash (/) is escaped.

See http://www.ietf.org/rfc/rfc4627.txt for further details.


That said -

I would probably recommend getting familiar with a mature Json framework if you're going to work with Json a lot. Using a mature Json framework will also allow you to use objects (POJOs) in your code and then easily serialize them to Json (Strings). So you can encapsulate your business logic in POJOs and then just quickly delegate to a utility class wrapping the Json framework in situations that you require Json (Strings).

One recommendation is the Jackson project.

There is a tutorial here.

It allows a tremendous amount of customization based on your requirements. The object that you use to serialize\deserialize is called an ObjectMapper. You can configure the ObjectMapper in a myriad of ways using some of the following classes or features:

  1. DeserializationFeature
  2. SerializationFeature
  3. Date formatting: Formatting the dates in your POJO to specific formats and timezones when serializing and deserializing.
  4. And much more. Familiarize yourself with the methods (configuration options) available on the ObjectMapper....

You might just also want to give a moment of thought to how you are going to instantiate and manage the ObjectMapper in your class to ensure threadsafety. Some thoughts here.

Upvotes: 27

Tolya Korablev
Tolya Korablev

Reputation: 1

public static string SerializeString(string str)
    {
        var builder = new StringBuilder(str.Length+4);
        builder.Append('\"');

        char[] charArray = str.ToCharArray();
        foreach (var c in charArray)
        {
            switch (c)
            {
                case '"': builder.Append("\\\""); break;
                case '\\': builder.Append("\\\\");break;
                case '\b': builder.Append("\\b"); break;
                case '\f': builder.Append("\\f"); break;
                case '\n': builder.Append("\\n"); break;
                case '\r': builder.Append("\\r"); break;
                case '\t': builder.Append("\\t"); break;
                
                default:
                    int codepoint = Convert.ToInt32(c);
                    if ((codepoint >= 32) && (codepoint <= 126))
                    {
                        builder.Append(c);
                    }
                    else
                    {
                        builder.Append("\\u");
                        builder.Append(codepoint.ToString("x4"));
                    }
                    break;
            }
        }

        builder.Append('\"');
        return builder.ToString();
    }

https://github.com/Jackyjjc/MiniJSON.cs/blob/master/MiniJSON.cs#L497

Upvotes: 0

durette
durette

Reputation: 424

If you just want a manual one-off solution to escape some text on the fly and are using a Windows machine, this PowerShell solution will work from a clean install of the OS with no other tools:

[PSCustomObject] @{
   'foo' = 'Hello, World!'
   'bar' = 'Goodbye, World!'
} | ConvertTo-JSON

Result:

{
    "foo":  "Hello, World!",
    "bar":  "Goodbye, World!"
}

If you're not familiar with PowerShell, these single quotes work like UNIX hard quotes; the only escape you'll need is for more sinqle quotes. Double quotes are often more convenient if your data has single quotes, but certain characters will need to be escaped with back ticks:

[PSCustomObject] @{
   'foo' = 'two single quotes: '''' and two dollar signs: $$'
   'bar' = "two single quotes: '' and two dollar signs: `$`$"
} | ConvertTo-JSON

Result:

{
    "foo":  "two single quotes: \u0027\u0027 and two dollar signs: $$",
    "bar":  "two single quotes: \u0027\u0027 and two dollar signs: $$"
}

Piping to clip.exe will output the result to your clipboard so you can paste it somewhere else:

[PSCustomObject] @{
   'foo' = 'some quotes: """'
   'bar' = 'picket fence: /\/\/\'
} | ConvertTo-JSON | clip

No result is displayed, but this is now in the user's clipboard:

{
    "foo":  "some quotes: \"\"\"",
    "bar":  "picket fence: /\\/\\/\\"
}

Upvotes: 0

Mohsen
Mohsen

Reputation: 3552

Apache commons-text now has a StringEscapeUtils.escapeJson(String).

Upvotes: 1

absmiths
absmiths

Reputation: 1174

I think the best answer in 2017 is to use the javax.json APIs. Use javax.json.JsonBuilderFactory to create your json objects, then write the objects out using javax.json.JsonWriterFactory. Very nice builder/writer combination.

Upvotes: 0

Stefan Steiger
Stefan Steiger

Reputation: 82176

The methods here that show the actual implementation are all faulty.
I don't have Java code, but just for the record, you could easily convert this C#-code:

Courtesy of the mono-project @ https://github.com/mono/mono/blob/master/mcs/class/System.Web/System.Web/HttpUtility.cs

public static string JavaScriptStringEncode(string value, bool addDoubleQuotes)
{
    if (string.IsNullOrEmpty(value))
        return addDoubleQuotes ? "\"\"" : string.Empty;

    int len = value.Length;
    bool needEncode = false;
    char c;
    for (int i = 0; i < len; i++)
    {
        c = value[i];

        if (c >= 0 && c <= 31 || c == 34 || c == 39 || c == 60 || c == 62 || c == 92)
        {
            needEncode = true;
            break;
        }
    }

    if (!needEncode)
        return addDoubleQuotes ? "\"" + value + "\"" : value;

    var sb = new System.Text.StringBuilder();
    if (addDoubleQuotes)
        sb.Append('"');

    for (int i = 0; i < len; i++)
    {
        c = value[i];
        if (c >= 0 && c <= 7 || c == 11 || c >= 14 && c <= 31 || c == 39 || c == 60 || c == 62)
            sb.AppendFormat("\\u{0:x4}", (int)c);
        else switch ((int)c)
            {
                case 8:
                    sb.Append("\\b");
                    break;

                case 9:
                    sb.Append("\\t");
                    break;

                case 10:
                    sb.Append("\\n");
                    break;

                case 12:
                    sb.Append("\\f");
                    break;

                case 13:
                    sb.Append("\\r");
                    break;

                case 34:
                    sb.Append("\\\"");
                    break;

                case 92:
                    sb.Append("\\\\");
                    break;

                default:
                    sb.Append(c);
                    break;
            }
    }

    if (addDoubleQuotes)
        sb.Append('"');

    return sb.ToString();
}

This can be compacted into

    // https://github.com/mono/mono/blob/master/mcs/class/System.Json/System.Json/JsonValue.cs
public class SimpleJSON
{

    private static  bool NeedEscape(string src, int i)
    {
        char c = src[i];
        return c < 32 || c == '"' || c == '\\'
            // Broken lead surrogate
            || (c >= '\uD800' && c <= '\uDBFF' &&
                (i == src.Length - 1 || src[i + 1] < '\uDC00' || src[i + 1] > '\uDFFF'))
            // Broken tail surrogate
            || (c >= '\uDC00' && c <= '\uDFFF' &&
                (i == 0 || src[i - 1] < '\uD800' || src[i - 1] > '\uDBFF'))
            // To produce valid JavaScript
            || c == '\u2028' || c == '\u2029'
            // Escape "</" for <script> tags
            || (c == '/' && i > 0 && src[i - 1] == '<');
    }



    public static string EscapeString(string src)
    {
        System.Text.StringBuilder sb = new System.Text.StringBuilder();

        int start = 0;
        for (int i = 0; i < src.Length; i++)
            if (NeedEscape(src, i))
            {
                sb.Append(src, start, i - start);
                switch (src[i])
                {
                    case '\b': sb.Append("\\b"); break;
                    case '\f': sb.Append("\\f"); break;
                    case '\n': sb.Append("\\n"); break;
                    case '\r': sb.Append("\\r"); break;
                    case '\t': sb.Append("\\t"); break;
                    case '\"': sb.Append("\\\""); break;
                    case '\\': sb.Append("\\\\"); break;
                    case '/': sb.Append("\\/"); break;
                    default:
                        sb.Append("\\u");
                        sb.Append(((int)src[i]).ToString("x04"));
                        break;
                }
                start = i + 1;
            }
        sb.Append(src, start, src.Length - start);
        return sb.ToString();
    }
}

Upvotes: 0

David
David

Reputation: 1654

using the \uXXXX syntax can solve this problem, google UTF-16 with the name of the sign, you can find out XXXX, for example:utf-16 double quote

Upvotes: 0

webjockey
webjockey

Reputation: 1755

If you need to escape JSON inside JSON string, use org.json.JSONObject.quote("your json string that needs to be escaped") seem to work well

Upvotes: 1

Dhiraj
Dhiraj

Reputation: 560

If you are using fastexml jackson, you can use the following: com.fasterxml.jackson.core.io.JsonStringEncoder.getInstance().quoteAsString(input)

If you are using codehaus jackson, you can use the following: org.codehaus.jackson.io.JsonStringEncoder.getInstance().quoteAsString(input)

Upvotes: 6

I.G. Pascual
I.G. Pascual

Reputation: 5965

org.json.JSONObject quote(String data) method does the job

import org.json.JSONObject;
String jsonEncodedString = JSONObject.quote(data);

Extract from the documentation:

Encodes data as a JSON string. This applies quotes and any necessary character escaping. [...] Null will be interpreted as an empty string

Upvotes: 13

dpetruha
dpetruha

Reputation: 1284

Try this org.codehaus.jettison.json.JSONObject.quote("your string").

Download it here: http://mvnrepository.com/artifact/org.codehaus.jettison/jettison

Upvotes: 38

orip
orip

Reputation: 75427

Consider Moshi's JsonWriter class. It has a wonderful API and it reduces copying to a minimum, everything can be nicely streamed to a filed, OutputStream, etc.

OutputStream os = ...;
JsonWriter json = new JsonWriter(Okio.buffer(Okio.sink(os)));
json.beginObject();
json.name("id").value(getId());
json.name("scores");
json.beginArray();
for (Double score : getScores()) {
  json.value(score);
}
json.endArray();
json.endObject();

If you want the string in hand:

Buffer b = new Buffer(); // okio.Buffer
JsonWriter writer = new JsonWriter(b);
//...
String jsonString = b.readUtf8();

Upvotes: 2

J28
J28

Reputation: 1110

Use EscapeUtils class in commons lang API.

EscapeUtils.escapeJavaScript("Your JSON string");

Upvotes: 2

vijucat
vijucat

Reputation: 2088

For those who came here looking for a command-line solution, like me, cURL's --data-urlencode works fine:

curl -G -v -s --data-urlencode 'query={"type" : "/music/artist"}' 'https://www.googleapis.com/freebase/v1/mqlread'

sends

GET /freebase/v1/mqlread?query=%7B%22type%22%20%3A%20%22%2Fmusic%2Fartist%22%7D HTTP/1.1

, for example. Larger JSON data can be put in a file and you'd use the @ syntax to specify a file to slurp in the to-be-escaped data from. For example, if

$ cat 1.json 
{
  "type": "/music/artist",
  "name": "The Police",
  "album": []
}

you'd use

curl -G -v -s --data-urlencode [email protected] 'https://www.googleapis.com/freebase/v1/mqlread'

And now, this is also a tutorial on how to query Freebase from the command line :-)

Upvotes: 2

Thanatos
Thanatos

Reputation: 44256

Ideally, find a JSON library in your language that you can feed some appropriate data structure to, and let it worry about how to escape things. It'll keep you much saner. If for whatever reason you don't have a library in your language, you don't want to use one (I wouldn't suggest this¹), or you're writing a JSON library, read on.

Escape it according to the RFC. JSON is pretty liberal: The only characters you must escape are \, ", and control codes (anything less than U+0020).

This structure of escaping is specific to JSON. You'll need a JSON specific function. All of the escapes can be written as \uXXXX where XXXX is the UTF-16 code unit¹ for that character. There are a few shortcuts, such as \\, which work as well. (And they result in a smaller and clearer output.)

For full details, see the RFC.

¹JSON's escaping is built on JS, so it uses \uXXXX, where XXXX is a UTF-16 code unit. For code points outside the BMP, this means encoding surrogate pairs, which can get a bit hairy. (Or, you can just output the character directly, since JSON's encoded for is Unicode text, and allows these particular characters.)

Upvotes: 188

Hanubindh Krishna
Hanubindh Krishna

Reputation: 85

StringEscapeUtils.escapeJavaScript / StringEscapeUtils.escapeEcmaScript should do the trick too.

Upvotes: 6

MonoThreaded
MonoThreaded

Reputation: 12033

Extract From Jettison:

 public static String quote(String string) {
         if (string == null || string.length() == 0) {
             return "\"\"";
         }

         char         c = 0;
         int          i;
         int          len = string.length();
         StringBuilder sb = new StringBuilder(len + 4);
         String       t;

         sb.append('"');
         for (i = 0; i < len; i += 1) {
             c = string.charAt(i);
             switch (c) {
             case '\\':
             case '"':
                 sb.append('\\');
                 sb.append(c);
                 break;
             case '/':
 //                if (b == '<') {
                     sb.append('\\');
 //                }
                 sb.append(c);
                 break;
             case '\b':
                 sb.append("\\b");
                 break;
             case '\t':
                 sb.append("\\t");
                 break;
             case '\n':
                 sb.append("\\n");
                 break;
             case '\f':
                 sb.append("\\f");
                 break;
             case '\r':
                sb.append("\\r");
                break;
             default:
                 if (c < ' ') {
                     t = "000" + Integer.toHexString(c);
                     sb.append("\\u" + t.substring(t.length() - 4));
                 } else {
                     sb.append(c);
                 }
             }
         }
         sb.append('"');
         return sb.toString();
     }

Upvotes: 60

Dan-Dev
Dan-Dev

Reputation: 9430

org.json.simple.JSONObject.escape() escapes quotes,\, /, \r, \n, \b, \f, \t and other control characters. It can be used to escape JavaScript codes.

import org.json.simple.JSONObject;
String test =  JSONObject.escape("your string");

Upvotes: 24

Tjunkie
Tjunkie

Reputation: 497

I have not spent the time to make 100% certain, but it worked for my inputs enough to be accepted by online JSON validators:

org.apache.velocity.tools.generic.EscapeTool.EscapeTool().java("input")

although it does not look any better than org.codehaus.jettison.json.JSONObject.quote("your string")

I simply use velocity tools in my project already - my "manual JSON" building was within a velocity template

Upvotes: 2

Vladimir
Vladimir

Reputation: 2553

Not sure what you mean by "creating json manually", but you can use something like gson (http://code.google.com/p/google-gson/), and that would transform your HashMap, Array, String, etc, to a JSON value. I recommend going with a framework for this.

Upvotes: 3

Related Questions