Reputation: 167

Count of combinations of values in rows across the entire table

Using the following data in a table:

|   name |    d1 |     d2 |     d3 |     d4 |     d5 |     d6 |     d7 |     d8 |
|--------|-------|--------|--------|--------|--------|--------|--------|--------|
|  matty | 116.7 |  17.88 |   16.1 |  9.731 | (null) | (null) | (null) | (null) |
|   jana | 17.88 |  116.7 |  65.45 |   72.1 | (null) | (null) | (null) | (null) |
|  chris |  72.1 | (null) | (null) | (null) | (null) | (null) | (null) | (null) |
| khaled | 9.731 |  116.7 |  17.88 |   53.1 |      2 |   85.2 | (null) | (null) |
|    "   |   "   |   "    |   "    |   "    |   "    |   "    |   "    |   "    |
|    n   |   "   |   "    |   "    |   "    |   "    |   "    |   "    |   "    |

how is it possible to identify the count of times that combination of values did appear amongst all rows in SQL?

The following is the output sample required:

116.7, 17.88 (3)

116.7, 17.88, 9.731 (2)

72.1 (2)

16.1 (1)

65.45 (1)

53.1 (1)

2 (1)

85.2 (1)

If it is not possible with SQL, any alternative method could do it?

Upvotes: 1

Answers (3)

pozs

Reputation: 36244

There is nothing built in PostgreSQL for combination calculation, but you can write a function for it, f.ex.:

create or replace function combinations(variadic anyarray)
  returns setof anyarray
  language sql
  immutable
  called on null input
as $func$
  with recursive e as (
      select *
      from   unnest($1) with ordinality u(e, o)
      where  e is not null
  ),
  r as (
      select distinct on (e) array[e] ea, array[o] oa
      from   e
    union all
      select distinct on (oea) oea, oa || o
      from   r, e, lateral (select array_agg(u order by u) oea from unnest(ea || e) u) l
      where  o <> all(oa)
  )
  select ea
  from   r
$func$;

With this function, you can write a query like:

select     combinations, count(*)
from       table_name
cross join combinations(d1, d2, d3, d4, d5, d6, d7, d8)
group by   1

However, there will be a lot more combinations in your sample input than your sample output contains. (Maybe you just left them out to preserve space?)

http://rextester.com/NNVK84197

Notes:

The function above uses a variable number of parameters, which is converted to a native PostgreSQL array (because of variadic).
It accepts any type of input, as long as they are all of the same type (because of anyarray). This is called poliformism. Also, because of returns setof anyarray, it will return a complete result set (multiple rows) of the same array type.
language sql just simplifies the function body: it will not contain any advanced procedural language construct, like IF or LOOP anyway (language plpgsql can contain these).
The CTE with e alias unwinds the data from the input array, but preserves the ordering/indexing information in the o field (see with ordinality). This will be essential later, as we cannot use the values themselves to remove duplications (i.e. (2, 2) should be a valid combination, as you commented earlier). NULLs are discarded here.
The recursive CTE with r alias (thus the recursive keyword after with) will accumulate every combination. It starts with every single value. Then in each step, it appends an element, with another ordinality (index) from the original set (see where o <> all(oa)). Because the elements order in a combination does not matter (as you commented), I ordered the element in a sub-query. Also, both of the recursive query parts use distinct on (<combination>) to remove any possible repetition, which can be happening when multiple elements have the same value.
The solution query uses an implicit LATERAL join to calculate every combinations, for every rows. This step will multiply your table's original rows as many times as combinations exists for them. Then, we just need to use GROUP BY combinations & COUNT(*) each of them.

Upvotes: 3

etsa

Reputation: 5060

I think you can try something like this (you have to add to obtain combination for 4,5,6,7 and 8 values : I stopped at 3 values.)

    with CTE_001 as (
        SELECT name,D1 AS XVAL FROM mytable2 WHERE D1 IS NOT NULL
        UNION ALL
        SELECT name,D2 FROM mytable2 WHERE D2 IS NOT NULL
        UNION ALL
        SELECT name,D3 FROM mytable2 WHERE D3 IS NOT NULL
        UNION ALL
        SELECT name,D4 FROM mytable2 WHERE D4 IS NOT NULL
        UNION ALL
        SELECT name,D5 FROM mytable2 WHERE D5 IS NOT NULL
        UNION ALL
        SELECT name,D6 FROM mytable2 WHERE D6 IS NOT NULL
        UNION ALL
        SELECT name,D7 FROM mytable2 WHERE D7 IS NOT NULL
        UNION ALL
        SELECT name,D8 FROM mytable2 WHERE D8 IS NOT NULL
    )
    SELECT CONCAT(XVAL1, ', ', XVAL2) AS LOV, COUNT(*) AS RC 
    FROM(
     SELECT C1.NAME, C1.XVAL AS XVAL1, C2.XVAL AS XVAL2
        FROM CTE_001 C1 
        INNER JOIN CTE_001 C2 ON C1.NAME = C2.NAME
        WHERE C1.XVAL < C2.XVAL
        ) B 
        GROUP BY XVAL1, XVAL2
        HAVING COUNT(*) >1
    UNION ALL
    SELECT CONCAT(XVAL1, ', ' , XVAL2,', ', XVAL3), COUNT(*) AS RC 
    FROM(
    SELECT C1.NAME, C1.XVAL AS XVAL1, C2.XVAL AS XVAL2, C3.XVAL AS XVAL3
        FROM CTE_001 C1 
        INNER JOIN CTE_001 C2 O
N C1.NAME = C2.NAME
    INNER JOIN CTE_001 C3 ON C1.NAME = C3.NAME
    WHERE C1.XVAL < C2.XVAL  AND C1.XVAL < C3.XVAL AND C2.XVAL < C3.XVAL
    ) B
GROUP BY XVAL1, XVAL2, XVAL3
HAVING COUNT(*) >1
ORDER BY 2 DESC

Output:

    lov rc
1   17.880, 116.700 3
2   9.731, 116.700  2
3   9.731, 17.880   2
4   9.731, 17.880, 116.700  2

Upvotes: 0

Utsav

Reputation: 8103

In below case I am not imagining different combination for d1,d2. If both are same, you will get one value with count of 2.

So assuming that the number of columns are finite and fixed, then you can do it with help of union

Rextester Demo

select concat(array_to_string(array_agg(col),',')   ,' (',  cnt ,')' ) as result
from 
(
    select col,count(*) cnt
    from 
    (   select d1 as col from table1
        union all 
        select d2 from table1
        union all
        select d3 from table1
         --similarly add other columns
    ) t
    where col is not null
    group by col
) t1
group by cnt
order by cnt desc;

Output

result 
--------------------------
17.88,116.7 (3) 
72.1,16.1,65.45,9.731 (1)

Else you have to create a procedure to get all the columns in a union and then do group by and count like above.

Upvotes: 0

Count of combinations of values in rows across the entire table

Answers (3)

Related Questions