Reputation: 1
I am pulling a JSON file from an API and loading it into an S3 bucket using python. From there, I am creating an external stage in Snowflake in preparation to copy the json file into a variant column within a table. The final step is to flatten the variant column and ultimately create a new table based on the json/variant data. However, I am stuck on the step attempting to flatten the variant column. After reading through snowflake documentation and other stack overflow posts, it is uncertain if I am doing the flatten step correctly.
Step 1: Python - Pull a JSON file from an API ('https://fakestoreapi.com/products') and load it into an S3 bucket using python. This appears to be working as expected.
import requests
import boto3
import json
r = requests.get('https://fakestoreapi.com/products')
data = r.text
s3 = boto3.client('s3',
aws_access_key_id='key_entered_here',
aws_secret_access_key='secret_entered_here')
s3.put_object(
Body=json.dumps(data),
Bucket='s3_bucket_name_entered_here',
Key='products.json'
Step 2: Snowflake - Create File Format and create external stage in preparation to loading json file into table in snowflake as follows (This step appears to be working, but I am not confident the outer array is being stripped because once the file is loaded into a variant column in step 3, I see outer brackets '[]' wrapping the dataset):
-- Create file format
CREATE OR REPLACE FILE FORMAT JSON_FORMAT TYPE = 'JSON'
STRIP_OUTER_ARRAY = TRUE;
-- Create external stage
CREATE OR REPLACE STAGE [Stage_name]
url='s3_bucket_name'
credentials=(aws_key_id='key_id_entered_here' aws_secret_key='secret_key_entered_here')
FILE_FORMAT='JSON_FORMAT';
Step 3: Snowflake - Create raw_source table with variant column and copy json file from external stage into table as follows (sample of result set from SELECT statement of variant column shown below SQL):
CREATE OR REPLACE TABLE raw_source (
SRC VARIANT);
COPY INTO raw_source
FROM @FakeStore.Public.FakeStore_Stage
FILE_FORMAT = JSON_FORMAT;
-- Querying from the src variant column shows outer array still entacted?
SELECT src FROM raw_source;
"[{"id":1,"title":"Fjallraven - Foldsack No. 1 Backpack, Fits 15 Laptops","price":109.95,"description":"Your perfect pack for everyday use and walks in the forest. Stash your laptop (up to 15 inches) in the padded sleeve, your everyday","category":"men's clothing","image":"https://fakestoreapi.com/img/81fPKd-2AYL.AC_SL1500.jpg","rating":{"rate":3.9,"count":120}}, {"id":19,"title":"Opna Women's Short Sleeve Moisture","price":7.95,"description":"100% Polyester, Machine wash, 100% cationic polyester interlock, Machine Wash & Pre Shrunk for a Great Fit, Lightweight, roomy and highly breathable with moisture wicking fabric which helps to keep moisture away, Soft Lightweight Fabric with comfortable V-neck collar and a slimmer fit, delivers a sleek, more feminine silhouette and Added Comfort","category":"women's clothing","image":"https://fakestoreapi.com/img/51eg55uWmdL.AC_UX679.jpg","rating":{"rate":4.5,"count":146}},{"id":20,"title":"DANVOUY Womens T Shirt Casual Cotton Short","price":12.99,"description":"95%Cotton,5%Spandex, Features: Casual, Short Sleeve, Letter Print,V-Neck,Fashion Tees, The fabric is soft and has some stretch., Occasion: Casual/Office/Beach/School/Home/Street. Season: Spring,Summer,Autumn,Winter.","category":"women's clothing","image":"https://fakestoreapi.com/img/61pHAEJ4NML.AC_UX679.jpg","rating":{"rate":3.6,"count":145}}]"
Step 4: Snowflake - The SELECT statement below attempts to flatten variant column, but returns no result set. After reading through snowflake documentation and other stack overflow posts, I'm not certain I am doing this step correctly. This is where I am currently stuck.
SELECT
final.value:id::number as id
,final.value:title::varchar as title
,final.value:price::number as price
,final.value:description::varchar as description
,final.value:category::varchar as category
,final.value:image::varchar as image
,final.value:rating::varchar as rating
FROM raw_source AS rs, LATERAL FLATTEN(input => rs.src) as final;
Upvotes: 0
Views: 255
Reputation: 1
Thanks to NickW and Mike Walton for their comments on here, as well as some help from folks on a Snowflake forum, I was able to resolve my question here.
There were two things that needed to be updated.
The first pertains to Step 1 in my question. Below is the corrected python --> row 6 updated from 'r.text' to 'r.json()' and row 12 updated to 'Body=data
import requests
import boto3
import json
r = requests.get('https://fakestoreapi.com/products')
data = r.json()
s3 = boto3.client('s3',
aws_access_key_id='[AccessID]',
aws_secret_access_key='[AccessKey]')
s3.put_object(
Body=data,
Bucket='fakestore01',
Key='products.json'
)
The second pertains to Step 4 in my question The corrected SQL SELECT statement is as follows:
SELECT
src:id::number AS ID
,src:title::varchar AS Title
,src:price::float AS Price
,src:description::varchar AS Description
,src:category::varchar AS Category
,src:image::varchar AS ImageURL
,src:rating:rate::float AS RatingRate
,src:rating:count::number AS RatingCount
FROM raw_source
Upvotes: 0