Reputation: 911
I've created a staging table which I would like to populate from a JSON file in S3 bucket using the COPY command. It worked but values for firstName
, itemInSession
, lastName
, sessionId
, userAgent
, userId
were not copied across.
table.py
from dwh_conn import *
drop_staging_event = "DROP TABLE IF EXISTS staging_event;"
create_staging_event = ("""
CREATE TABLE IF NOT EXISTS staging_event(
artist VARCHAR,
auth VARCHAR,
firstName VARCHAR,
gender TEXT,
itemInSession INTEGER,
lastName VARCHAR,
length FLOAT,
level TEXT,
location VARCHAR,
method TEXT,
page TEXT,
registration FLOAT,
sessionId INTEGER,
song VARCHAR,
status INTEGER,
ts BIGINT,
userAgent VARCHAR,
userId VARCHAR
);""")
load_staging_event = ("""COPY staging_event FROM 's3://dwh-training/data/log_data/2018/11/'
credentials 'aws_iam_role={}' region 'us-east-2' JSON 'auto';""").format(ROLE_ARN)
The following returned empty rows:
SELECT firstName, itemInSession, lastName, sessionId, userAgent, userId FROM staging_event LIMIT 5;
firstname | iteminsession | lastname | sessionid | useragent | userid
-----------+---------------+----------+-----------+-----------+--------
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
(5 rows)
But other fields have values:
SELECT artist, auth, gender, length, level, location, method, page, registration, song, status, ts FROM staging_event LIMIT 5;
artist | auth | gender | length | level | location | method | page | registration | song | status | ts
---------------------------+-----------+--------+-----------+-------+--------------------------+--------+----------+---------------+---------------------------------+--------+---------------
N.E.R.D. FEATURING MALICE | Logged In | M | 288.9922 | free | New Orleans-Metairie, LA | PUT | NextSong | 1541033612796 | Am I High (Feat. Malice) | 200 | 1541121934796
| Logged In | F | | free | Lubbock, TX | GET | Home | 1540708070796 | | 200 | 1541122176796
Death Cab for Cutie | Logged In | F | 216.42404 | free | Lubbock, TX | PUT | NextSong | 1540708070796 | A Lack Of Color (Album Version) | 200 | 1541122241796
Tracy Gang Pussy | Logged In | F | 221.33506 | free | Lubbock, TX | PUT | NextSong | 1540708070796 | I Have A Wish | 200 | 1541122457796
Skillet | Logged In | M | 178.02404 | free | Harrisburg-Carlisle, PA | PUT | NextSong | 1540006905796 | Monster (Album Version) | 200 | 1541126568796
(5 rows)
Does anyone know why the empty fields were not being populated? Thanks.
Upvotes: 0
Views: 746
Reputation: 11032
Banty - Redshift COPY with "json 'auto'" will only load json values that it can match with the table. This can be done on column name matching top json keys but they need to match exactly. Check that these names match EXACTLY.
A better way to match things up is to use a json_paths file that maps the json keys to the Redshift columns. Name changes can be made there as well as loading data from non-top-level json keys.
This is just the most common reason that this happens that I know about. If you need more info reply and you will likely need to provide an except of the json you are loading.
Upvotes: 1