Reputation: 915
I would like to import json data into postgres. The data I have is of the orders of a million rows, sizes are a minimum of 700 MB and stretches till 3 GB.
Here's a sample data I created based on the structure of data I have. I tried importing this into postgres but I get an error.
Sample(1) data
{"offers":{"offer":[
{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5\" side"},
{"url": "https://some2-value.com", "nested": {"id":5,"value":"some2 text value"}, "quotes": "6\" side"},
{"url": "https://some3-value.com", "nested": {"id":6,"value":"some3 text value"}, "quotes": "7\" side"}]}}
Command I used and the error I got
# copy contrial from '/home/ubuntu/sample-data.json';
ERROR: invalid input syntax for type json
DETAIL: The input string ended unexpectedly.
CONTEXT: JSON data, line 1: {"offers":{"offer":[
COPY contrial, line 1, column info: "{"offers":{"offer":["
I modified the file to remove the first two keys and have just a list of jsons like below, but I still get an error.
Sample(2) data
[
{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5\" side"},
{"url": "https://some2-value.com", "nested": {"id":5,"value":"some2 text value"}, "quotes": "6\" side"},
{"url": "https://some3-value.com", "nested": {"id":6,"value":"some3 text value"}, "quotes": "7\" side"}]
Error
# copy contrial from '/home/ubuntu/sample2-data.json';
ERROR: invalid input syntax for type json
DETAIL: The input string ended unexpectedly.
CONTEXT: JSON data, line 1: [
COPY contrial, line 1, column info: "["
Sample(3) data I modified further
[{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5\" side"},
{"url": "https://some2-value.com", "nested": {"id":5,"value":"some2 text value"}, "quotes": "6\" side"},
{"url": "https://some3-value.com", "nested": {"id":6,"value":"some3 text value"}, "quotes": "7\" side"}]
Different Error
# copy contrial from '/home/ubuntu/sample2-data.json';
ERROR: invalid input syntax for type json
DETAIL: Token "side" is invalid.
CONTEXT: JSON data, line 1: ...,"value":"some1 text value"}, "quotes": "5" side...
COPY contrial, line 1, column info: "[{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5" si..."
Create table statement
CREATE TABLE public.contrial (
info json NOT NULL
);
The end goal is to create a table with keys as columns and values as records. Nested keys would need to be flattened.
+-------------------------+-----------+------------------+----------+
| url | nested_id | nested_value | quotes |
+-------------------------+-----------+------------------+----------+
| https://some1-value.com | 4 | some1 text value | 5\" side |
+-------------------------+-----------+------------------+----------+
| https://some2-value.com | 5 | some2 text value | 6\" side |
+-------------------------+-----------+------------------+----------+
| https://some3-value.com | 6 | some3 text value | 7\" side |
+-------------------------+-----------+------------------+----------+
Upvotes: 2
Views: 1689
Reputation: 915
I ended up using Andre Dunstan's blog and this SO answer which says to format the json in a specific way to use the copy command.
Since my structure is pretty defined for the files I'm parsing, I ended up with the following script.
def file_len(fname):
# to find the number of lines in the file.
# Has been pretty efficient even for millions of records
with open(fname) as f:
for i, l in enumerate(f):
pass
return i + 1
INPUTFILE = '/path/to/input.json'
OUTPUTFILE = '/path/to/output.json.csv'
LEN = file_len(INPUTFILE)
with open(OUTPUTFILE, 'w') as fo:
with open(INPUTFILE, 'r') as fi:
for i, l in enumerate(fi):
# I skip the first line
if i == 0: continue
# To remove the ']}}' from the end
elif i+1 == LEN: _ = fo.write(l[:-3])
# To remove the ',' from the end
# and add \n since write does not add newline on its own
else: _ = fo.write(l[:-2]+'\n')
# load statement
import sqlalchemy
POSTGRESQL = f'postgresql+psycopg2://{USERNAME}:{PASSWORD}@{HOSTNAME}/{DB}'
engine = sqlalchemy.create_engine(POSTGRESQL, echo=True)
con = engine.connect()
trans = con.begin()
LOAD_SQL = f"COPY tablename from '{OUTPUTFILE}' with csv delimiter E'\x01' quote E'\x02' null as '';"
try:
con.execute(LOAD_SQL)
trans.commit()
except Exception as e:
trans.rollback()
finally:
con.close()
Upvotes: 3