any_random
any_random

Reputation: 23

snowflake external table from .csv file under s3

Let's assume I have .csv file like:

event,user
1,123
2,321

This .csv file is located under s3.

Running the following sql to create an external table(with @TEST_STAGE created and has correct s3 path):

CREATE OR REPLACE EXTERNAL TABLE TEST_CSV_TABLE1(
  event_id VARCHAR AS (value:$1::varchar),
  user_id VARCHAR AS (value:$2::varchar)
)
WITH LOCATION = @TEST_STAGE
FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = ',' SKIP_HEADER = 1);

Querying the following table results in the following output:

|-----|----------------------------|----------|---------|
| Row |           VALUE            | EVENT_ID | USER_ID |
|-----|----------------------------|----------|---------|
|  1  | { "c1": "1", "c2": "123" } |   NULL   |   NULL  |
|-----|----------------------------|----------|---------|
|  2  | { "c1": "2", "c2": "321" } |   NULL   |   NULL  |
|-----|----------------------------|----------|---------|

However, if I just create a table as

CREATE OR REPLACE TABLE TEST_CSV_TABLE2(
  event_id VARCHAR,
  user_id VARCHAR
);

and load the same file like:

COPY INTO TEST_CSV_TABLE2 FROM @TEST_STAGE
FILES = ('test.csv')
FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = ',' SKIP_HEADER = 1);

or even like:

COPY INTO TEST_CSV_TABLE2
FROM (
SELECT 
  t.$1,
  t.$2
FROM @ TEST_STAGE t)
FILES = ('test.csv')
FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = ',' SKIP_HEADER = 1);

This results into properly assigned columns:

|-----|----------|---------|
| Row | EVENT_ID | USER_ID |
|-----|----------|---------|
|  1  |     1    |   123   |
|-----|----------|---------|
|  2  |     2    |   321   |
|-----|----------|---------|

Why columns are not pick properly in case of external table? Many thanks ahead.

Upvotes: 2

Views: 6650

Answers (1)

David Garrison
David Garrison

Reputation: 2880

You need to use the name of the column when you are pulling it out of the JSON. What you have is creating the JSON column, and then parsing it for attributes in the JSON called "$1" and "$2". When it doesn't find such an attribute it returns NULL to that column.

CREATE OR REPLACE EXTERNAL TABLE TEST_CSV_TABLE1(
  event_id VARCHAR AS (value:c1::varchar),
  user_id VARCHAR AS (value:c2::varchar)
)
WITH LOCATION = @TEST_STAGE
FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = ',' SKIP_HEADER = 1);

Using a copy into with the $1 and $2 isn't using those to parse JSON like above, it's the syntax specific to a copy into query to reference the columns in a file.

Upvotes: 6

Related Questions