Berra
Berra

Reputation: 43

Find intersection between two columns consisting of comma-separated items in BigQuery SQL

I have several columns in a joined table which we can call 'sessions_and_transactions' with data on web sessions transacting (purchasing one or several items) on an ecommerce platform. Two of these columns are 'products_added_to_cart_from_product_page' and 'products_purchased' which are both of string data type and consisting of comma-separated product id:s. Commonly, the comma-separated products in 'products_added_to_cart_from_product_page' will also be in the 'products_purchased' column. Sometimes, however, the user doesn't purchase all or even any of the products added to cart product page. The user can also add products to cart from other pages than the product page (for instance, from a recommendations page).

For each session (session_id), I would like to insert into a new column of all product_id:s in 'products_added_to_cart_from_product_page', that are also in 'products_purchased'. So basically, the set intersection of products.

An example: enter image description here

I remind you that the comma-separated lists of items are not of array data type

Upvotes: 1

Views: 1082

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173046

Consider below options

using intersect distinct

select *,
  array_to_string(array(
    select * from unnest(split(products_added_to_cart_from_product_page, ', '))
    intersect distinct
    select * from unnest(split(products_purchased, ', '))),
  ', ', '') as New_purchased_from_product_page
from `project.dataset.table`

using join

select *,
  ( select string_agg(product, ', ')
    from unnest(split(products_added_to_cart_from_product_page, ', ')) product
    join unnest(split(products_purchased, ', ')) product
    using(product)
  ) as New_purchased_from_product_page
from `project.dataset.table`

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1270401

You can still use array functions, via split(). This puts the final value as an array (but you can easily convert it):

select t.*,
       (select array_agg(el2)
        from unnest(split(t.col2, ',')) el2 join
             unnest(split(t.col3, ',')) el3
             on el2 = el3
       ) as in_common
from t;

Note: If the delimiter is really comma-space then use ', ' for the split().

Upvotes: 1

Related Questions