Reputation: 207
I have a ClickHouse database with a simple table with two fields (pagePath string, pageviews int)
I want sum visitis for each filter + value, to know pageviews by users in filters (most used filter) . Filters are separated by comma
Example data
pagePath | pageviews |
---|---|
/url1/filter1.value1 | 1 |
/url1/filter1.value2 | 2 |
/url1/filter1.value3,filter1.value2 | 3 |
/url1/filter1.value4,filter3.value2 | 4 |
/url1/filter1.value5,filter3.value2,filter1.value2 | 5 |
/url1/filter2.value1,filter3.value2,filter1.value2 | 6 |
/url1/filter2.value2 | 7 |
/url-2/filter2.value3 | 8 |
/url-2/filter2.value4 | 9 |
/url-11/filter3.value1 | 10 |
/url-21/filter3.value1 | 11 |
/url1/filter3.value2 | 12 |
/url1/filter3.value3 | 13 |
/url1/filter3.value4 | 14 |
create table T (pagePath String , pageviews Int64) Engine=Memory;
insert into T values ('/url1/filter1.value1',1);
insert into T values ('/url1/filter1.value2',2);
insert into T values ('/url1/filter1.value3,filter1.value2',3);
insert into T values ('/url1/filter1.value4,filter3.value2',4);
insert into T values ('/url1/filter1.value5,filter3.value2,filter1.value2',5);
insert into T values ('/url1/filter2.value1,filter3.value2,filter1.value2',6);
insert into T values ('/url1/filter2.value2',7);
insert into T values ('/url-2/filter2.value3',8);
insert into T values ('/url-2/filter2.value4',9);
insert into T values ('/url-11/filter3.value1',10);
insert into T values ('/url-21/filter3.value1',11);
insert into T values ('/url1/filter3.value2',12);
insert into T values ('/url1/filter3.value3',13);
insert into T values ('/url1/filter3.value4',14);
And I want get sum of pageviews foreach filter + value
Filters | pageviews |
---|---|
filter1.value1 | 1 |
filter1.value2 | 16 (2 + 3 +5 +6) |
filter1.value3 | 3 |
filter1.value4 | 4 |
filter1.value5 | 5 |
filter2.value1 | 6 |
filter2.value2 | 7 |
filter2.value3 | 8 |
filter2.value4 | 9 |
filter3.value1 | 10 |
filter3.value1 | 11 |
filter3.value2 | 12 (4+5+6+12) |
filter3.value3 | 13 |
filter3.value4 | 14 |
And also I want get sum of pageviews foreach filter (without values)
Filters | pageviews |
---|---|
filter1 | 15 (1, 2, 3, 4, 5) |
filter2 | 30 (6+ 7+ 8+ 9) |
filter1 | 60 ( 10, 11, 12, 13, 14) |
I try with
select
arrJoinFilters,
sum (PV) as totales
from
(
SELECT
arrJoinFilters,
splitByChar(',',replaceRegexpAll(pagePath,'^/url.*/(.*\..*)$','\\1')) arrFilter,
pageviews as PV
FROM
Table ARRAY JOIN arrFilter AS arrJoinFilters
GROUP by arrFilter,PV,arrJoinFilters
)
group by
arrJoinFilters
order by arrJoinFilters
But, I think there are some wrong, and I don't get second result desired
Thanks!
Upvotes: 0
Views: 1101
Reputation: 13310
4+5+6+12 = 27
SELECT
arrayJoin(splitByChar(',',replaceRegexpAll(pagePath,'^/url.*/(.*\..*)$','\\1'))) f,
sum(pageviews) as PV
FROM T
GROUP by f
order by f
┌─f──────────────┬─PV─┐
│ filter1.value1 │ 1 │
│ filter1.value2 │ 16 │
│ filter1.value3 │ 3 │
│ filter1.value4 │ 4 │
│ filter1.value5 │ 5 │
│ filter2.value1 │ 6 │
│ filter2.value2 │ 7 │
│ filter2.value3 │ 8 │
│ filter2.value4 │ 9 │
│ filter3.value1 │ 21 │
│ filter3.value2 │ 27 │
│ filter3.value3 │ 13 │
│ filter3.value4 │ 14 │
└────────────────┴────┘
select splitByChar('.',f)[1] x, sum(PV), groupArray(PV), groupArray(f)
from (
SELECT
arrayJoin(splitByChar(',',replaceRegexpAll(pagePath,'^/url.*/(.*\..*)$','\\1'))) f,
sum(pageviews) as PV
FROM T
GROUP by f
order by f) group by x
┌─x───────┬─sum(PV)─┬─groupArray(PV)─┬─groupArray(f)──────────────────────────────────────────────────────────────────────────┐
│ filter2 │ 30 │ [6,7,8,9] │ ['filter2.value1','filter2.value2','filter2.value3','filter2.value4'] │
│ filter3 │ 75 │ [21,27,13,14] │ ['filter3.value1','filter3.value2','filter3.value3','filter3.value4'] │
│ filter1 │ 29 │ [1,16,3,4,5] │ ['filter1.value1','filter1.value2','filter1.value3','filter1.value4','filter1.value5'] │
└─────────┴─────────┴────────────────┴────────────────────────────────────────────────────────────────────────────────────────┘
Upvotes: 1