anil
anil

Reputation: 668

Loading json data from a file into Postgres

I need to load data from multiple JSON files each having multiple records within them to a Postgres table. I am using the following code but it does not work (am using pgAdmin III on windows)

COPY tbl_staging_eventlog1 ("EId", "Category", "Mac", "Path", "ID")
from 'C:\\SAMPLE.JSON' 
delimiter ','
;

Content of SAMPLE.JSON file is like this (giving two records out of many such):

[{"EId":"104111","Category":"(0)","Mac":"ABV","Path":"C:\\Program Files (x86)\\Google","ID":"System.Byte[]"},{"EId":"104110","Category":"(0)","Mac":"BVC","Path":"C:\\Program Files (x86)\\Google","ID":"System.Byte[]"}]

Upvotes: 13

Views: 39220

Answers (3)

Christian
Christian

Reputation: 7310

Try this:

BEGIN;
-- let's create a temp table to bulk data into
create temporary table temp_json (values text) on commit drop;
copy temp_json from 'C:\SAMPLE.JSON';

insert into tbl_staging_eventlog1 ("EId", "Category", "Mac", "Path", "ID") 
select values->>'EId' as EId,
       values->>'Category' as Category,
       values->>'Mac' as Mac,
       values->>'Path' as Path,
       values->>'ID' as ID      
from   (
           select json_array_elements(replace(values,'\','\\')::json) as values 
           from   temp_json
       ) a; 
COMMIT;

Upvotes: 32

Daniel C. Moura
Daniel C. Moura

Reputation: 345

You can use spyql. Running the following command would generate INSERT statements that you can pipe into psql:

$ jq -c .[] *.json | spyql -Otable=tbl_staging_eventlog1 "SELECT json->EId, json->Category, json->Mac, json->Path, json->ID FROM json TO sql"
INSERT INTO "tbl_staging_eventlog1"("EId","Category","Mac","Path","ID") VALUES ('104111','(0)','ABV','C:\Program Files (x86)\Google','System.Byte[]'),('104110','(0)','BVC','C:\Program Files (x86)\Google','System.Byte[]');

jq is used to transform the json arrays from all json files in the current directory into json lines (1 json object per line) and then spyql takes care of converting json lines into INSERT statements.

To import the data into PostgreSQL:

$ jq -c .[] *.json | spyql -Otable=tbl_staging_eventlog1 "SELECT json->EId, json->Category, json->Mac, json->Path, json->ID FROM json TO sql" | psql  -U your_user_name -h your_host your_database

Disclaimer: I am the author of spyql.

Upvotes: 0

Ferris
Ferris

Reputation: 5601

As mentioned in Andrew Dunstan's PostgreSQL and Technical blog

In text mode, COPY will be simply defeated by the presence of a backslash in the JSON. So, for example, any field that contains an embedded double quote mark, or an embedded newline, or anything else that needs escaping according to the JSON spec, will cause failure. And in text mode you have very little control over how it works - you can't, for example, specify a different ESCAPE character. So text mode simply won't work.

so we have to turn around to the CSV format mode.

copy the_table(jsonfield) 
from '/path/to/jsondata' 
csv quote e'\x01' delimiter e'\x02';

In the official document sql-copy, some Parameters list here:

COPY table_name [ ( column_name [, ...] ) ]
    FROM { 'filename' | PROGRAM 'command' | STDIN }
    [ [ WITH ] ( option [, ...] ) ]
    [ WHERE condition ]

where option can be one of:

    FORMAT format_name
    FREEZE [ boolean ]
    DELIMITER 'delimiter_character'
    NULL 'null_string'
    HEADER [ boolean ]
    QUOTE 'quote_character'
    ESCAPE 'escape_character'
    FORCE_QUOTE { ( column_name [, ...] ) | * }
    FORCE_NOT_NULL ( column_name [, ...] )
    FORCE_NULL ( column_name [, ...] )
    ENCODING 'encoding_name'
  • FORMAT
    • Selects the data format to be read or written: text, csv (Comma Separated Values), or binary. The default is text.
  • QUOTE
    • Specifies the quoting character to be used when a data value is quoted. The default is double-quote. This must be a single one-byte character. This option is allowed only when using CSV format.
  • DELIMITER
    • Specifies the character that separates columns within each row (line) of the file. The default is a tab character in text format, a comma in CSV format. This must be a single one-byte character. This option is not allowed when using binary format.
  • NULL
    • Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.
  • HEADER
    • Specifies that the file contains a header line with the names of each column in the file. On output, the first line contains the column names from the table, and on input, the first line is ignored. This option is allowed only when using CSV format.

Upvotes: 0

Related Questions