Banty
Banty

Reputation: 911

Redshift COPY command - values for some fields did not copy across

I've created a staging table which I would like to populate from a JSON file in S3 bucket using the COPY command. It worked but values for firstName, itemInSession, lastName, sessionId, userAgent, userId were not copied across.

table.py

from dwh_conn import *

drop_staging_event = "DROP TABLE IF EXISTS staging_event;"

create_staging_event = ("""
CREATE TABLE IF NOT EXISTS staging_event(
    artist VARCHAR,
    auth VARCHAR,
    firstName VARCHAR,
    gender TEXT,
    itemInSession INTEGER,
    lastName VARCHAR,
    length FLOAT,
    level TEXT,
    location VARCHAR,
    method TEXT,
    page TEXT,
    registration FLOAT,
    sessionId INTEGER,
    song VARCHAR,
    status INTEGER,
    ts BIGINT,
    userAgent VARCHAR,
    userId VARCHAR
    );""")

load_staging_event = ("""COPY staging_event FROM 's3://dwh-training/data/log_data/2018/11/'
    credentials 'aws_iam_role={}' region 'us-east-2' JSON 'auto';""").format(ROLE_ARN)

The following returned empty rows:

SELECT firstName, itemInSession, lastName, sessionId, userAgent, userId FROM staging_event LIMIT 5;
 firstname | iteminsession | lastname | sessionid | useragent | userid 
-----------+---------------+----------+-----------+-----------+--------
           |               |          |           |           | 
           |               |          |           |           | 
           |               |          |           |           | 
           |               |          |           |           | 
           |               |          |           |           | 
(5 rows)

But other fields have values:

SELECT artist, auth, gender, length, level, location, method, page, registration, song, status, ts FROM staging_event LIMIT 5;
          artist           |   auth    | gender |  length   | level |         location         | method |   page   | registration  |              song               | status |      ts       
---------------------------+-----------+--------+-----------+-------+--------------------------+--------+----------+---------------+---------------------------------+--------+---------------
 N.E.R.D. FEATURING MALICE | Logged In | M      |  288.9922 | free  | New Orleans-Metairie, LA | PUT    | NextSong | 1541033612796 | Am I High (Feat. Malice)        |    200 | 1541121934796
                           | Logged In | F      |           | free  | Lubbock, TX              | GET    | Home     | 1540708070796 |                                 |    200 | 1541122176796
 Death Cab for Cutie       | Logged In | F      | 216.42404 | free  | Lubbock, TX              | PUT    | NextSong | 1540708070796 | A Lack Of Color (Album Version) |    200 | 1541122241796
 Tracy Gang Pussy          | Logged In | F      | 221.33506 | free  | Lubbock, TX              | PUT    | NextSong | 1540708070796 | I Have A Wish                   |    200 | 1541122457796
 Skillet                   | Logged In | M      | 178.02404 | free  | Harrisburg-Carlisle, PA  | PUT    | NextSong | 1540006905796 | Monster (Album Version)         |    200 | 1541126568796
(5 rows)

Does anyone know why the empty fields were not being populated? Thanks.

Upvotes: 0

Views: 746

Answers (1)

Bill Weiner
Bill Weiner

Reputation: 11032

Banty - Redshift COPY with "json 'auto'" will only load json values that it can match with the table. This can be done on column name matching top json keys but they need to match exactly. Check that these names match EXACTLY.

A better way to match things up is to use a json_paths file that maps the json keys to the Redshift columns. Name changes can be made there as well as loading data from non-top-level json keys.

This is just the most common reason that this happens that I know about. If you need more info reply and you will likely need to provide an except of the json you are loading.

Upvotes: 1

Related Questions