Reputation: 6763
Consider a table in Google BigQuery containing purchase records for customer. For the sake of simplicity, let's focus on the following properties:
customer_id, product_id, amount
I'd like to create a Google Data Studio report from the above data set showing a customer lifetime value histogram. The customer lifetime value is the sum of amount
for any given customer. The histogram would show how many customers fall into a certain bucket by their total amount - I would define the buckets like 0-10, 10-20, 20-30 etc. value ranges.
Finally, I'd also like to filter the histogram by product_id. When the filter is active, the histogram would show the totals for customers who - at least once - purchased the given product.
As of this moment, I think this is not possible to implement in Datastudio, but I hope I am wrong.
Things I've tried so far:
Displaying an average customer lifetime value for the whole dataset is easy, via a calculated field in Datastudio as SUM(amount) / COUNT(customer_id)
For creating a histogram, I don't see any way purely in Data Studio (based on the above data set). I think I need to create a view of the original table, consisting a single row for each customer with the total amount. The bucket assignment could be implemented either in Big Query or in Data Studio with CASE ... WHEN
.
However, for the final step, i.e. creating a product filter that filters the histogram for those customers who purchased the given product, I have no clue how to approach this.
Any thoughts?
Upvotes: 1
Views: 2624
Reputation: 7058
I was able to do a similar reproduction to what you describe but it's not straightforward so I'll try to detail everything. The main idea is to have two data sources from the same table: one contains customer_id
and product_id
so that we can filter it while the other one contains customer_id
and the already calculated amount_bucket
field. This way we can join it (blend data) on customer_id
and filter according to product_id
which won't change the amount_bucket
calculations.
I used the following script to create some data in BigQuery:
CREATE OR REPLACE TABLE data_studio.histogram
(
customer_id STRING,
product_id STRING,
amount INT64
);
INSERT INTO data_studio.histogram (customer_id, product_id, amount)
VALUES ('John', 'Game', 60),
('John', 'TV', 800),
('John', 'Console', 300),
('Paul', 'Sofa', 1200),
('George', 'TV', 750),
('Ringo', 'Movie', 20),
('Ringo', 'Console', 250)
;
Then I connect directly to the BigQuery table and get the following fields. Data source is called histogram
:
We add our second data source (BigQuery
) using a custom query:
SELECT
customer_id,
CASE
WHEN SUM(amount) < 500 THEN '0-500'
WHEN SUM(amount) < 1000 THEN '500-1000'
WHEN SUM(amount) < 1500 THEN '1000-1500'
ELSE '1500+'
END
AS amount_bucket
FROM
data_studio.histogram
GROUP BY
customer_id
With only the latter we could already do a basic histogram with the following configuration:
Dimension is amount_bucket
, metric is Record count
. I made a bucket_order
custom field to sort it as lexicographically '1000-1500' comes before '500-1000':
CASE
WHEN amount_bucket = '0-500' THEN 0
WHEN amount_bucket = '500-1000' THEN 1
WHEN amount_bucket = '1000-1500' THEN 2
ELSE 3
END
Now we add the product_id
filter on top and a new chart with the following configuration:
Note that metric is CTD (Count Distinct) of customer_id
and the Blended data
data source is implemented as:
An example where I filter by TV
so only George
and John
appear but the other products are still counted for the total amount calculation:
I hope it works for you.
Upvotes: 2