Reputation: 43
I am trying to load data from parquet file in AWS S3 into snowflake table. But getting the below error. Could you please help.
SQL compilation error: PARQUET file format can produce one and only one column of type variant or object or array.
Use CSV file format if you want to load more than one column.
Parquet file schema
|-- uuid: string (nullable = true)
|-- event_timestamp: timestamp (nullable = true)
|-- params: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: string (nullable = true)
| | |-- name: string (nullable = true)
| | |-- type: string (nullable = true)
| | |-- value: string (nullable = true)
Here is the sample data. uuid,event_timestamp,params 3f230ea5-dd52-4cf9-bdde-b79201eb1001,2020-05-10 17:06:21.524,[{id=501, type=custom, name=filtering, value=true}, {id=502, type=custom, name=select, value=false}]
snowflake table
create or replace table temp_log (
uuid string,
event_timestamp timestamp,
params array);
I am using the below copy command to load data
copy into temp_log
from '<<s3 path>>'
pattern = '*.parquet'
storage_integration = <<integration object>
file_format = (
type = parquet
compression = snappy
)
;
Upvotes: 2
Views: 7462
Reputation: 554
I use these 2 sql to load the data into table,
first: use this SQL to create the table sql
with cols as (
select COLUMN_NAME || ' ' || TYPE col
from table(
infer_schema(
location=>'@LANDING/myFile.parquet'
, file_format=>'LANDING.default_parquet'
)
)
),
temp as (
select 'create or replace table myTable (' col1
union
select listagg(col, ',') col1
from cols
union
select ') ' col1
)
select listagg(col1)
from temp
Second, use this SQL to create the copy into SQL load the data into table
with cols as (
select expression
from table(
infer_schema(
location=>'@LANDING/myFile.parquet'
, file_format=>'LANDING.default_parquet'
)
)
),
temp as (
select 'copy into myTable from ( select ' col1
union
select listagg(expression, ',') col1
from cols
union
select 'from @LANDING/myFile.parquet ) ' col1
)
select listagg(col1)
from temp
Upvotes: 0
Reputation: 5650
In my case the error message was raised because I was running the command as
COPY INTO my_db.my_schema.my_table
FROM (
SELECT *
FROM @my_stage
) FILE_FORMAT = ( TYPE = PARQUET );
Instead each column should be specified as $1:my_column
in the 'select statement', for example:
COPY INTO my_db.my_schema.my_table
FROM (
SELECT $1:my_column1, $1:my_column2, $1:my_column3
FROM @my_stage
) FILE_FORMAT = ( TYPE = PARQUET );
Upvotes: 0
Reputation: 43
This issue was resolved after creating table as below
create or replace table temp_log (
logcontent VARIANT);
Upvotes: 0
Reputation: 9768
This documentation explains how to load parquet data into multiple columns: Loading Parquet
UPDATE
I'm not sure if the comment below is a response to my answer and, if it is, what the relevance of it is? Did you read the document and, if you did, what part of it do you still have questions about?
You need to have your data in a stage (presumably an external stage in your case), or possibly in an external table, and then load from that into your table using "COPY INTO table FROM (SELECT..." with the $1:.. notation allowing you to select the appropriate elements from you parquet structure.
From the documentation:
/* Load the Parquet data into the relational table. */
/* */
/* A SELECT query in the COPY statement identifies a numbered set of columns in the data files you are */
/* loading from. Note that all Parquet data is stored in a single column ($1). */
/* */
/* Cast element values to the target column data type. */
copy into cities
from (select
$1:continent::varchar,
$1:country:name::varchar,
$1:country:city.bag::variant
from @sf_tut_stage/cities.parquet);
Upvotes: 2