lino
lino

Reputation: 207

Extract and sum values with subfields inside string using ClickHouse

I have a ClickHouse database with a simple table with two fields (pagePath string, pageviews int)

I want sum visitis for each filter + value, to know pageviews by users in filters (most used filter) . Filters are separated by comma

Example data

pagePath pageviews
/url1/filter1.value1 1
/url1/filter1.value2 2
/url1/filter1.value3,filter1.value2 3
/url1/filter1.value4,filter3.value2 4
/url1/filter1.value5,filter3.value2,filter1.value2 5
/url1/filter2.value1,filter3.value2,filter1.value2 6
/url1/filter2.value2 7
/url-2/filter2.value3 8
/url-2/filter2.value4 9
/url-11/filter3.value1 10
/url-21/filter3.value1 11
/url1/filter3.value2 12
/url1/filter3.value3 13
/url1/filter3.value4 14
create table T (pagePath String , pageviews Int64) Engine=Memory;

insert into T values ('/url1/filter1.value1',1);
insert into T values ('/url1/filter1.value2',2);
insert into T values ('/url1/filter1.value3,filter1.value2',3);
insert into T values ('/url1/filter1.value4,filter3.value2',4);
insert into T values ('/url1/filter1.value5,filter3.value2,filter1.value2',5);
insert into T values ('/url1/filter2.value1,filter3.value2,filter1.value2',6);
insert into T values ('/url1/filter2.value2',7);
insert into T values ('/url-2/filter2.value3',8);
insert into T values ('/url-2/filter2.value4',9);
insert into T values ('/url-11/filter3.value1',10);
insert into T values ('/url-21/filter3.value1',11);
insert into T values ('/url1/filter3.value2',12);
insert into T values ('/url1/filter3.value3',13);
insert into T values ('/url1/filter3.value4',14);

And I want get sum of pageviews foreach filter + value

Filters pageviews
filter1.value1 1
filter1.value2 16 (2 + 3 +5 +6)
filter1.value3 3
filter1.value4 4
filter1.value5 5
filter2.value1 6
filter2.value2 7
filter2.value3 8
filter2.value4 9
filter3.value1 10
filter3.value1 11
filter3.value2 12 (4+5+6+12)
filter3.value3 13
filter3.value4 14

And also I want get sum of pageviews foreach filter (without values)

Filters pageviews
filter1 15 (1, 2, 3, 4, 5)
filter2 30 (6+ 7+ 8+ 9)
filter1 60 ( 10, 11, 12, 13, 14)

I try with

select
    arrJoinFilters,
    sum (PV) as totales
from
    (
    SELECT
        arrJoinFilters,
        splitByChar(',',replaceRegexpAll(pagePath,'^/url.*/(.*\..*)$','\\1')) arrFilter,
        pageviews as PV
    FROM
        Table ARRAY JOIN arrFilter AS arrJoinFilters
    GROUP by arrFilter,PV,arrJoinFilters
        )
group by
    arrJoinFilters
    order by arrJoinFilters

But, I think there are some wrong, and I don't get second result desired

Thanks!

Upvotes: 0

Views: 1101

Answers (1)

Denny Crane
Denny Crane

Reputation: 13310

4+5+6+12 = 27

SELECT
      arrayJoin(splitByChar(',',replaceRegexpAll(pagePath,'^/url.*/(.*\..*)$','\\1'))) f,
      sum(pageviews) as PV
FROM T  
GROUP by f
order by f

┌─f──────────────┬─PV─┐
│ filter1.value1 │  1 │
│ filter1.value2 │ 16 │
│ filter1.value3 │  3 │
│ filter1.value4 │  4 │
│ filter1.value5 │  5 │
│ filter2.value1 │  6 │
│ filter2.value2 │  7 │
│ filter2.value3 │  8 │
│ filter2.value4 │  9 │
│ filter3.value1 │ 21 │
│ filter3.value2 │ 27 │
│ filter3.value3 │ 13 │
│ filter3.value4 │ 14 │
└────────────────┴────┘


select splitByChar('.',f)[1] x, sum(PV), groupArray(PV), groupArray(f)
from (
SELECT
  arrayJoin(splitByChar(',',replaceRegexpAll(pagePath,'^/url.*/(.*\..*)$','\\1'))) f,
  sum(pageviews) as PV
FROM T  
GROUP by f
order by f) group by x


┌─x───────┬─sum(PV)─┬─groupArray(PV)─┬─groupArray(f)──────────────────────────────────────────────────────────────────────────┐
│ filter2 │      30 │ [6,7,8,9]      │ ['filter2.value1','filter2.value2','filter2.value3','filter2.value4']                  │
│ filter3 │      75 │ [21,27,13,14]  │ ['filter3.value1','filter3.value2','filter3.value3','filter3.value4']                  │
│ filter1 │      29 │ [1,16,3,4,5]   │ ['filter1.value1','filter1.value2','filter1.value3','filter1.value4','filter1.value5'] │
└─────────┴─────────┴────────────────┴────────────────────────────────────────────────────────────────────────────────────────┘

Upvotes: 1

Related Questions