Peter Krauss
Peter Krauss

Reputation: 13920

How to ingest JSON from TXT file?

The problem is not to get TXT because it is

SELECT to_jsonb(file_get_contents('/tmp/test.json'))

returns not a JSON object but a string... Even with replace(txt,E'\n',' ') it not works. How to normalize and really convert string to JSON?

PS: I am using JSONb that must be some thing as JSON for ingestion.


NOTES

The json file /tmp/test.json:

[
    {
      "foo": "etc",
      "bar": "etc",
      "x": 123
    },
    {
      "foo": "aaa",
      "bar": "bbb",
      "x": 456
    }
]

I am using UBUNTU 18 LTS, PostgreSQL v12 and file in linux standard TXT. Termial command file -i /tmp/test.json say that is all good, "text/plain; charset=utf-8".

To load full-text into only one field (seems impossible with COPY FROM), PostgreSQL is ugly, but this function was tested and is reliable:

CREATE EXTENSION PLpython3U;
CREATE or replace FUNCTION file_get_contents(p_file text) RETURNS text AS $$
   import os.path
   if not os.path.isfile(args[0]):
       return None
   with open(args[0],"r") as content_file:
       content = content_file.read()
   return content
$$ LANGUAGE PLpython3U;

Upvotes: 0

Views: 513

Answers (2)

jjanes
jjanes

Reputation: 44137

It is already in a JSON format you just have to cast it:

SELECT file_get_contents('/tmp/test.json')::jsonb

The docs for "to_json" say

For any scalar type other than a number, a Boolean, or a null value, the text representation will be used, in such a fashion that it is a valid json or jsonb value.

In other words, it does the escaping and quoting necessary to turn your file into a single JSON value. Not a JSON object, just a single value. So "to_json" is the wrong tool for the job.

Upvotes: 1

William Prigol Lopes
William Prigol Lopes

Reputation: 1889

Using postgresql, I saved your file on my /tmp directory.

To read the file, if the file is on your postgresql server, you can use pg_read_file().

To simulate your scenario I tried the following:

APPROACH 1 (json_array_elements_text):

SELECT JSON_ARRAY_ELEMENTS_TEXT(
       REPLACE(PG_READ_FILE('/tmp/teste.txt'), E'\n', '')::JSON);

This series of functions is very similar that you used, except by the pg_read_file that reads file on server and json_array_elements_text.

The result is:

                  json_array_elements_text                  
------------------------------------------------------------
 {      "foo": "etc",      "bar": "etc",      "x": 123    }
 {      "foo": "aaa",      "bar": "bbb",      "x": 456    }
(2 rows)

APPROACH 2 (json_to_recordset):

select * from json_to_recordset(replace(pg_read_file('/tmp/teste.txt'), E'\n', '')::json) as ("foo" varchar, "bar" varchar, "x" int);
 foo | bar |  x  
-----+-----+-----
 etc | etc | 123
 aaa | bbb | 456
(2 rows)

The result is similar than the first example but, in this approach we define columns based on elements found on original json data

Upvotes: 1

Related Questions