Reputation: 2106
When copying a table from snowflake using the snowflake spark connector, the default behavior is to map structured data to spark strings: https://docs.snowflake.net/manuals/user-guide/spark-connector-use.html#from-snowflake-to-spark-sql
For example, given a table in snowflake:
create table schema.table as
select
array_construct('1','a') as array_col,
object_construct('1','a') as obj_col
And copying this in pyspark
df = snowflake.sql_context.read.format("snowflake url")\
.options(**snowflake_options)\
.load()
results in the dataframe:
> df: pyspark.sql.dataframe.DataFrame
> ARRAY_COL:string
> OBJ_COL:string
Is there currently a way to override this default behavior within the connector to map snowflake OBJECT
to spark StructType
and/or snowflake ARRAY
to spark MapType
?
Upvotes: 3
Views: 1965
Reputation: 21
The "column-mapping" is used to map column name not column type. For snowflake VARIANT/ARRAY/OBJECT column, JSON string is returned. It should be possible to convert the JSON string to a MapType by an UDF.
Upvotes: 1
Reputation: 541
I hope that you are well, I think you may be looking for the columnmapping parameter. I am not an expert at this though, so I am just going to share the research I found.
Specifically for Spark table to a Snowflake table https://docs.snowflake.net/manuals/user-guide/spark-connector-overview.html#column-mapping
For snowflake ARRAY to spark MapType: Is the object stored as an array in a Snowflake table? Is this what you were looking for? https://sparkbyexamples.com/spark/spark-sql-map-functions/
Upvotes: 1