Calculating and displaying customer lifetime value histogram with BigQuery and Data Studio

Consider a table in Google BigQuery containing purchase records for customer. For the sake of simplicity, let's focus on the following properties: customer_id, product_id, amount

I'd like to create a Google Data Studio report from the above data set showing a customer lifetime value histogram. The customer lifetime value is the sum of amount for any given customer. The histogram would show how many customers fall into a certain bucket by their total amount - I would define the buckets like 0-10, 10-20, 20-30 etc. value ranges.

Like this: enter image description here

Finally, I'd also like to filter the histogram by product_id. When the filter is active, the histogram would show the totals for customers who - at least once - purchased the given product.

As of this moment, I think this is not possible to implement in Datastudio, but I hope I am wrong.

Things I've tried so far:

  1. Displaying an average customer lifetime value for the whole dataset is easy, via a calculated field in Datastudio as SUM(amount) / COUNT(customer_id)

  2. For creating a histogram, I don't see any way purely in Data Studio (based on the above data set). I think I need to create a view of the original table, consisting a single row for each customer with the total amount. The bucket assignment could be implemented either in Big Query or in Data Studio with CASE ... WHEN.

However, for the final step, i.e. creating a product filter that filters the histogram for those customers who purchased the given product, I have no clue how to approach this.

Any thoughts?

Upvotes: 1

Views: 2624

Answers (1)

Guillem Xercavins
Guillem Xercavins

Reputation: 7058

I was able to do a similar reproduction to what you describe but it's not straightforward so I'll try to detail everything. The main idea is to have two data sources from the same table: one contains customer_id and product_id so that we can filter it while the other one contains customer_id and the already calculated amount_bucket field. This way we can join it (blend data) on customer_id and filter according to product_id which won't change the amount_bucket calculations.

I used the following script to create some data in BigQuery:

CREATE OR REPLACE TABLE data_studio.histogram
(
  customer_id STRING,
  product_id STRING,
  amount INT64
);

INSERT INTO data_studio.histogram (customer_id, product_id, amount)
VALUES ('John', 'Game', 60),
       ('John', 'TV', 800),
       ('John', 'Console', 300),
       ('Paul', 'Sofa', 1200),
       ('George', 'TV', 750),
       ('Ringo', 'Movie', 20),
       ('Ringo', 'Console', 250)
;

Then I connect directly to the BigQuery table and get the following fields. Data source is called histogram:

enter image description here

We add our second data source (BigQuery) using a custom query:

SELECT
  customer_id,
  CASE
    WHEN SUM(amount) < 500 THEN '0-500'
    WHEN SUM(amount) < 1000 THEN '500-1000'
    WHEN SUM(amount) < 1500 THEN '1000-1500'
    ELSE '1500+'
END
  AS amount_bucket
FROM
  data_studio.histogram
GROUP BY
  customer_id

With only the latter we could already do a basic histogram with the following configuration:

enter image description here

Dimension is amount_bucket, metric is Record count. I made a bucket_order custom field to sort it as lexicographically '1000-1500' comes before '500-1000':

CASE 
  WHEN amount_bucket = '0-500' THEN 0
  WHEN amount_bucket = '500-1000' THEN 1
  WHEN amount_bucket = '1000-1500' THEN 2
  ELSE 3
END

Now we add the product_id filter on top and a new chart with the following configuration:

enter image description here

Note that metric is CTD (Count Distinct) of customer_id and the Blended data data source is implemented as:

enter image description here

An example where I filter by TV so only George and John appear but the other products are still counted for the total amount calculation:

enter image description here

I hope it works for you.

Upvotes: 2

Related Questions