newsoldpro
newsoldpro

Reputation: 1

Snowflake: Attempt to Flatten Variant Column returning no result set (Not sure prior step stripping outer array from JSON worked properly)

I am pulling a JSON file from an API and loading it into an S3 bucket using python. From there, I am creating an external stage in Snowflake in preparation to copy the json file into a variant column within a table. The final step is to flatten the variant column and ultimately create a new table based on the json/variant data. However, I am stuck on the step attempting to flatten the variant column. After reading through snowflake documentation and other stack overflow posts, it is uncertain if I am doing the flatten step correctly.

Step 1: Python - Pull a JSON file from an API ('https://fakestoreapi.com/products') and load it into an S3 bucket using python. This appears to be working as expected.

import requests
import boto3
import json

r = requests.get('https://fakestoreapi.com/products')
data = r.text

s3 = boto3.client('s3',
    aws_access_key_id='key_entered_here',
    aws_secret_access_key='secret_entered_here')

s3.put_object(
     Body=json.dumps(data),
     Bucket='s3_bucket_name_entered_here',
     Key='products.json'

Step 2: Snowflake - Create File Format and create external stage in preparation to loading json file into table in snowflake as follows (This step appears to be working, but I am not confident the outer array is being stripped because once the file is loaded into a variant column in step 3, I see outer brackets '[]' wrapping the dataset):

-- Create file format
CREATE OR REPLACE FILE FORMAT JSON_FORMAT TYPE = 'JSON' 
    STRIP_OUTER_ARRAY = TRUE;

-- Create external stage
CREATE OR REPLACE STAGE [Stage_name]
    url='s3_bucket_name'
    credentials=(aws_key_id='key_id_entered_here' aws_secret_key='secret_key_entered_here')
    FILE_FORMAT='JSON_FORMAT';

Step 3: Snowflake - Create raw_source table with variant column and copy json file from external stage into table as follows (sample of result set from SELECT statement of variant column shown below SQL):

CREATE OR REPLACE TABLE raw_source (
  SRC VARIANT);

COPY INTO raw_source
  FROM @FakeStore.Public.FakeStore_Stage
  FILE_FORMAT = JSON_FORMAT;

-- Querying from the src variant column shows outer array still entacted?
SELECT src FROM raw_source;

"[{"id":1,"title":"Fjallraven - Foldsack No. 1 Backpack, Fits 15 Laptops","price":109.95,"description":"Your perfect pack for everyday use and walks in the forest. Stash your laptop (up to 15 inches) in the padded sleeve, your everyday","category":"men's clothing","image":"https://fakestoreapi.com/img/81fPKd-2AYL.AC_SL1500.jpg","rating":{"rate":3.9,"count":120}}, {"id":19,"title":"Opna Women's Short Sleeve Moisture","price":7.95,"description":"100% Polyester, Machine wash, 100% cationic polyester interlock, Machine Wash & Pre Shrunk for a Great Fit, Lightweight, roomy and highly breathable with moisture wicking fabric which helps to keep moisture away, Soft Lightweight Fabric with comfortable V-neck collar and a slimmer fit, delivers a sleek, more feminine silhouette and Added Comfort","category":"women's clothing","image":"https://fakestoreapi.com/img/51eg55uWmdL.AC_UX679.jpg","rating":{"rate":4.5,"count":146}},{"id":20,"title":"DANVOUY Womens T Shirt Casual Cotton Short","price":12.99,"description":"95%Cotton,5%Spandex, Features: Casual, Short Sleeve, Letter Print,V-Neck,Fashion Tees, The fabric is soft and has some stretch., Occasion: Casual/Office/Beach/School/Home/Street. Season: Spring,Summer,Autumn,Winter.","category":"women's clothing","image":"https://fakestoreapi.com/img/61pHAEJ4NML.AC_UX679.jpg","rating":{"rate":3.6,"count":145}}]"

Step 4: Snowflake - The SELECT statement below attempts to flatten variant column, but returns no result set. After reading through snowflake documentation and other stack overflow posts, I'm not certain I am doing this step correctly. This is where I am currently stuck.

SELECT
     final.value:id::number as id
    ,final.value:title::varchar as title
    ,final.value:price::number as price
    ,final.value:description::varchar as description
    ,final.value:category::varchar as category
    ,final.value:image::varchar as image
    ,final.value:rating::varchar as rating
FROM raw_source AS rs, LATERAL FLATTEN(input => rs.src) as final;

Upvotes: 0

Views: 255

Answers (1)

newsoldpro
newsoldpro

Reputation: 1

Thanks to NickW and Mike Walton for their comments on here, as well as some help from folks on a Snowflake forum, I was able to resolve my question here.

There were two things that needed to be updated.

The first pertains to Step 1 in my question. Below is the corrected python --> row 6 updated from 'r.text' to 'r.json()' and row 12 updated to 'Body=data

import requests
import boto3
import json

r = requests.get('https://fakestoreapi.com/products')
data = r.json()

s3 = boto3.client('s3',
    aws_access_key_id='[AccessID]',
    aws_secret_access_key='[AccessKey]')
s3.put_object(
     Body=data,
     Bucket='fakestore01',
     Key='products.json'
)

The second pertains to Step 4 in my question The corrected SQL SELECT statement is as follows:

SELECT
     src:id::number AS ID
    ,src:title::varchar AS Title
    ,src:price::float AS Price
    ,src:description::varchar AS Description
    ,src:category::varchar AS Category
    ,src:image::varchar AS ImageURL
    ,src:rating:rate::float AS RatingRate
    ,src:rating:count::number AS RatingCount
FROM raw_source 

Upvotes: 0

Related Questions