Burke
Burke

Reputation: 42

How to get PostgreSQL to escape text from jsonb_array_element?

I'm loading some JSON from Postgres 13 into Elasticsearch using Logstash and ran into some errors caused by text not being escaped with reverse solidus. I tracked my problem down to this behavior:

SELECT
  json_build_object(
    'literal_text', 'abc\ndef'::text,
    'literal_text_type', pg_typeof('abc\ndef'::text),
    'text_from_jsonb_array_element', a->>0,
    'jsonb_array_element_type', pg_typeof(a->>0)
  )
FROM jsonb_array_elements('["abc\ndef"]') jae (a);
{
 "literal_text": "abc\\ndef",
 "literal_text_type": "text",
  "text_from_jsonb_array_element": "abc\ndef",
  "jsonb_array_element_type":"text"
}

db-fiddle

json_build_object encodes the literal text as expected (turning \n into \\n); however, it doesn't encode the text retrieved via jsonb_array_element even though both are text.

Why is the text extracted from jsonb_array_element being treated differently (not getting escaped by jsonb_build_object)? I've tried casting, using jsonb_array_elements_text (though my actual use case involves an array of arrays, so I need to split to a set of jsonb), and various escaping/encoding/formatting functions, but haven't found a solution yet.

Is there a trick to cast text pulled from jsonb_array_element so it will get properly encoded by jsonb_build_object?

Thanks for any hints or solutions.

Upvotes: 0

Views: 1907

Answers (1)

Blue Star
Blue Star

Reputation: 1952

Those strings look awfully similar, but they're actually different. When you create a string literal like '\n', that's a backslash character followed by an "n" character. So when you put that into json_build_object, it needs to add a backslash to escape the backslash you're giving it.

On the other hand, when you call jsonb_array_elements('["abc\ndef"]'), you're saying that the JSON has precisely a \n encoded in it with no second backslash, and therefore when it's converted to text, that \n is interpreted as a newline character, not two separate characters. You can see this easily by running the following:

SELECT a->>0 FROM jsonb_array_elements('["abc\ndef"]') a;
 ?column?
----------
 abc     +
 def
(1 row)

On encoding that back into a JSON, you get a single backslash again, because it's once again encoding a newline character.

If you want to escape it with an extra backslash, I suggest a simple replace:

SELECT 
  json_build_object(
    'text_from_jsonb_with_replace', replace(a->>0, E'\n', '\n')
  ) 
FROM jsonb_array_elements('["abc\ndef"]') jae (a);
               json_build_object
------------------------------------------------
 {"text_from_jsonb_with_replace" : "abc\\ndef"}

Upvotes: 1

Related Questions