Reputation: 167
Using the following data in a table:
| name | d1 | d2 | d3 | d4 | d5 | d6 | d7 | d8 |
|--------|-------|--------|--------|--------|--------|--------|--------|--------|
| matty | 116.7 | 17.88 | 16.1 | 9.731 | (null) | (null) | (null) | (null) |
| jana | 17.88 | 116.7 | 65.45 | 72.1 | (null) | (null) | (null) | (null) |
| chris | 72.1 | (null) | (null) | (null) | (null) | (null) | (null) | (null) |
| khaled | 9.731 | 116.7 | 17.88 | 53.1 | 2 | 85.2 | (null) | (null) |
| " | " | " | " | " | " | " | " | " |
| n | " | " | " | " | " | " | " | " |
how is it possible to identify the count of times that combination of values did appear amongst all rows in SQL?
The following is the output sample required:
116.7, 17.88 (3)
116.7, 17.88, 9.731 (2)
72.1 (2)
16.1 (1)
65.45 (1)
53.1 (1)
2 (1)
85.2 (1)
If it is not possible with SQL, any alternative method could do it?
Upvotes: 1
Views: 181
Reputation: 36244
There is nothing built in PostgreSQL for combination calculation, but you can write a function for it, f.ex.:
create or replace function combinations(variadic anyarray)
returns setof anyarray
language sql
immutable
called on null input
as $func$
with recursive e as (
select *
from unnest($1) with ordinality u(e, o)
where e is not null
),
r as (
select distinct on (e) array[e] ea, array[o] oa
from e
union all
select distinct on (oea) oea, oa || o
from r, e, lateral (select array_agg(u order by u) oea from unnest(ea || e) u) l
where o <> all(oa)
)
select ea
from r
$func$;
With this function, you can write a query like:
select combinations, count(*)
from table_name
cross join combinations(d1, d2, d3, d4, d5, d6, d7, d8)
group by 1
However, there will be a lot more combinations in your sample input than your sample output contains. (Maybe you just left them out to preserve space?)
http://rextester.com/NNVK84197
Notes:
variadic
).anyarray
). This is called poliformism. Also, because of returns setof anyarray
, it will return a complete result set (multiple rows) of the same array type.language sql
just simplifies the function body: it will not contain any advanced procedural language construct, like IF
or LOOP
anyway (language plpgsql
can contain these).e
alias unwinds the data from the input array, but preserves the ordering/indexing information in the o
field (see with ordinality
). This will be essential later, as we cannot use the values themselves to remove duplications (i.e. (2, 2)
should be a valid combination, as you commented earlier). NULL
s are discarded here.r
alias (thus the recursive
keyword after with
) will accumulate every combination. It starts with every single value. Then in each step, it appends an element, with another ordinality (index) from the original set (see where o <> all(oa)
). Because the elements order in a combination does not matter (as you commented), I ordered the element in a sub-query. Also, both of the recursive query parts use distinct on (<combination>)
to remove any possible repetition, which can be happening when multiple elements have the same value.LATERAL
join to calculate every combinations, for every rows. This step will multiply your table's original rows as many times as combinations exists for them. Then, we just need to use GROUP BY combinations
& COUNT(*)
each of them.Upvotes: 3
Reputation: 5060
I think you can try something like this (you have to add to obtain combination for 4,5,6,7 and 8 values : I stopped at 3 values.)
with CTE_001 as (
SELECT name,D1 AS XVAL FROM mytable2 WHERE D1 IS NOT NULL
UNION ALL
SELECT name,D2 FROM mytable2 WHERE D2 IS NOT NULL
UNION ALL
SELECT name,D3 FROM mytable2 WHERE D3 IS NOT NULL
UNION ALL
SELECT name,D4 FROM mytable2 WHERE D4 IS NOT NULL
UNION ALL
SELECT name,D5 FROM mytable2 WHERE D5 IS NOT NULL
UNION ALL
SELECT name,D6 FROM mytable2 WHERE D6 IS NOT NULL
UNION ALL
SELECT name,D7 FROM mytable2 WHERE D7 IS NOT NULL
UNION ALL
SELECT name,D8 FROM mytable2 WHERE D8 IS NOT NULL
)
SELECT CONCAT(XVAL1, ', ', XVAL2) AS LOV, COUNT(*) AS RC
FROM(
SELECT C1.NAME, C1.XVAL AS XVAL1, C2.XVAL AS XVAL2
FROM CTE_001 C1
INNER JOIN CTE_001 C2 ON C1.NAME = C2.NAME
WHERE C1.XVAL < C2.XVAL
) B
GROUP BY XVAL1, XVAL2
HAVING COUNT(*) >1
UNION ALL
SELECT CONCAT(XVAL1, ', ' , XVAL2,', ', XVAL3), COUNT(*) AS RC
FROM(
SELECT C1.NAME, C1.XVAL AS XVAL1, C2.XVAL AS XVAL2, C3.XVAL AS XVAL3
FROM CTE_001 C1
INNER JOIN CTE_001 C2 O
N C1.NAME = C2.NAME
INNER JOIN CTE_001 C3 ON C1.NAME = C3.NAME
WHERE C1.XVAL < C2.XVAL AND C1.XVAL < C3.XVAL AND C2.XVAL < C3.XVAL
) B
GROUP BY XVAL1, XVAL2, XVAL3
HAVING COUNT(*) >1
ORDER BY 2 DESC
Output:
lov rc
1 17.880, 116.700 3
2 9.731, 116.700 2
3 9.731, 17.880 2
4 9.731, 17.880, 116.700 2
Upvotes: 0
Reputation: 8103
In below case I am not imagining different combination for d1
,d2
. If both are same, you will get one value with count of 2
.
So assuming that the number of columns are finite and fixed, then you can do it with help of union
select concat(array_to_string(array_agg(col),',') ,' (', cnt ,')' ) as result
from
(
select col,count(*) cnt
from
( select d1 as col from table1
union all
select d2 from table1
union all
select d3 from table1
--similarly add other columns
) t
where col is not null
group by col
) t1
group by cnt
order by cnt desc;
Output
result
--------------------------
17.88,116.7 (3)
72.1,16.1,65.45,9.731 (1)
Else you have to create a procedure to get all the columns in a union and then do group by
and count
like above.
Upvotes: 0