What is the best way to run a report in notebooks when connected to snowflake connector?

Question

My last couple of questions have been on how to connect to snowflake and add and read data with the python connector in a ipython notebook. However, I am having troubling with the next best step to create a report with the data I seek to visualize.

I would like to upload all of the data, store it, then analyze it, kind of like a homemade dashboard.

So what I have done so far is a small version:

Staged my data from a local file, and I will run adding new data each time I open the notebook
Then I will use the python connector to call any data from storage
Create visualizations with numpy objects in the local notebook.

My data will start out very small, but over time I would imagine I would have to move computation to the cloud to minimize the memory used locally for the small dashboard.

My question is, my data is called from a api that results in json files, new data is no bigger that 75 MB a day 8 columns, with two aggregate calls to the data, done in the sql call. If I run these visualizations monthly, is it better to aggregate the information in Snowflake, or locally?

doyouevendata · Accepted Answer

My question is, my data is called from a api that results in json files, new data is no bigger that 75 MB a day 8 columns, with two aggregate calls to the data, done in the sql call. If I run these visualizations monthly, is it better to aggregate the information in Snowflake, or locally?

I would flatten your data in python or Snowflake - depending on which you feel more comfortable using or how complex the data is. You can just do everything on the straight json, although I would rarely look to design something that way myself (it's going to be the slowest to query.)

As far as aggregating the data, I'd always do that on Snowflake. If you would like to slice and dice the data various ways, you may look to design a data mart data model and have your dashboard simply aggregate data on the fly via queries. Snowflake should be pretty good with that, but for additional speed then aggregating it up to months may be a good idea too.

You can probably mature your process from being local python script driven too something like serverless lambda and event driven wwith a scheduler as well.

What is the best way to run a report in notebooks when connected to snowflake connector?

Answers (2)

Related Questions