How to create a key:value pair or 2 sets of lists returning item ids and corresponding number of purchases in sql (or snowflake to be more specific)?

Question

I have an enormous clickstream data which I can pull the timestamp, user_ids corresponding item_ids that customers clicked.

I am trying to create a dataset that will have the unique customer ids (user_id) on each row and another column that has key:value pair that provides me the list of items and number of times corresponding item is clicked like a python dictionary (or this can be in two columns with items and corresponding num. clicks as a sorted list from most frequent to least). The number of clicks for an item is the times that item_id occur for a fixed user_id. Basically, every occurrence of item is one click. If it occurs 10 times for a fixed user_id, that means that click_count = 10 for that user_id, item_id pair.

Here is an example:

USER_ID	CLICKED_ITEMS
user1	item3: 20, item2: 10, item1: 5
user2	item1: 10, item2: 1, item3: 1
user3	item2: 1
user4	item1: 10, item2: 2

or

USER_ID	CLICKED_ITEMS	NUM_CLICKS
user1	item3, item2, item1	20, 10, 5
user2	item1, item2, item3	10, 1, 1
user3	item2	1
user4	item1, item2	0, 2

Is this possible using SQL (esp. for Snowflake syntax)? TIA

I tried

SELECT user_id, using ARRAY_UNIQUE_AGG(item_id) AS clicked_items
FROM clickstream_table
GROUP_BY user_id

and got unique user_ids and corresponding list of items but not sure how to find number of times each item is clicked. I can do this easily using pandas but unfortunately I cannot even pull 1 month of data due to size and timeout limitations (3hr limit) on warehouse.

How to create a key:value pair or 2 sets of lists returning item ids and corresponding number of purchases in sql (or snowflake to be more specific)?

Answers (1)

Related Questions

USER_ID	CLICKED_ITEMS
user1	[ { "item3": 20 }, { "item2": 10 }, { "item1": 5 } ]
user2	[ { "item1": 10 }, { "item2": 1 }, { "item3": 1 } ]
user3	[ { "item2": 1 } ]
user4	[ { "item2": 10 }, { "item2": 2 } ]