Create UDAF(not UDTF) in Snowflake

Java UDFs return a scalar result. Java UDTFs are not currently supported.reference

That said, I created a Java UDF as given below

CREATE OR replace function MAP_COUNT(colValue String)
returns OBJECT 
language java
handler='Frequency.calculate'
target_path='@~/Frequency.jar'
as
$$
    import java.util.HashMap;
    import java.util.Map;
    import java.util.Optional;
    class Frequency {
        Map<String, Integer> frequencies = new HashMap<>();

        public Map<String, Integer> calculate(String colValue) {
            frequencies.putIfAbsent(colValue, 0);
            frequencies.computeIfPresent(colValue, (key, value) -> value + 1);
            return frequencies;
        }
    }
$$;

Using MAP_COUNT UDF in a query as below

with temp_1 as
(
    SELECT 'John' AS my_col, 27 as age
    UNION ALL
    SELECT 'John' AS my_col, 28 as age
    UNION ALL
    SELECT 'doe' AS my_col, 27 as age
    UNION ALL
    SELECT 'doe' AS my_col, 28 as age
)
select  MAP_COUNT(a.my_col) from temp_1 a;

I get result as below

|MAP_COUNT(A.MY_COL)            |
|-------------------------------|
|{  "John": "1" }               |
|{  "John": "2" }               |
|{ "John": "2",  "doe": "1" }   |
|{  "John": "2",  "doe": "2"}   |

The result I expect from my UDF is as below

|MAP_COUNT(A.MY_COL)            |
|-------------------------------|
|{  "John": "2",  "doe": "2"}   |

Is it possible in snowflake?

What if I have query like below?

with temp_1 as
(
    SELECT 'John' AS my_col, 27 as age
    UNION ALL
    SELECT 'John' AS my_col, 28 as age
    UNION ALL
    SELECT 'doe' AS my_col, 27 as age
    UNION ALL
    SELECT 'doe' AS my_col, 28 as age
)
select  MAP_COUNT(a.my_col) as names, MAP_COUNT(a.age) as ages  from temp_1 a;

The result I expect from my UDF is as below

|names                          ||AGES                           |
|-------------------------------||-------------------------------|
|{  "John": "2",  "doe": "2"}   ||{  "27": "2",  "28": "2"}      |

There are ways to achieve this by simply restructuring query but I want to know if it is possible to do by using MAP_COUNT function similar to OBJECT_AGG function in select clause.

Upvotes: 2

Answers (2)

Stuart Ozer

Reputation: 1384

You can achieve much of this desired functionality using Javascript UDTF's today. In particular, a UDTF can be configured to only return a (aggregated) result at the end of a "grouping". Here is an example:

CREATE OR REPLACE FUNCTION MAP_COUNT(COLVALUE varchar)
  RETURNS TABLE (FREQUENCIES variant)
  LANGUAGE JAVASCRIPT
  AS '
  {
    initialize: function (argumentInfo, context) {
        this.freq = new Map();
    },

    processRow: function (row, rowWriter, context) {
        freqVal = this.freq[row.COLVALUE];
        this.freq[row.COLVALUE] = (freqVal == undefined ? 1 : 1 + freqVal);
    },

    finalize: function (rowWriter, context) {
        rowWriter.writeRow({FREQUENCIES: this.freq});
    }
}
'
;

Create a temp table to test this on:

create or replace temporary table mytemp as
with temp_1 as
(
    SELECT 'John' AS my_col, 27 as age
    UNION ALL
    SELECT 'John' AS my_col, 28 as age
    UNION ALL
    SELECT 'doe' AS my_col, 27 as age
    UNION ALL
    SELECT 'doe' AS my_col, 28 as age
)
select * from temp_1;

Run the UDTF as a single partition result:

select agg.* from myTemp,
table(map_count(my_col)) agg
;
    Result:  { "John": 2, "doe": 2 }

Run the UDTF partitioned by separate groupings:

select my_col, agg.* from myTemp,
table(map_count(age::varchar) over (partition by my_col)) agg
;
    Result:
    doe   { "27": 1, "28": 1 }
    John  { "27": 1, "28": 1 }

Upvotes: 0

Elliott Brossard

Reputation: 33745

When you run a query that uses a UDF, not all rows will necessarily go to the same instance of the UDF. For example, let's say that you're selecting from a table, and you do:

SELECT MyUdf(x) FROM T

Here T may have multiple micro-partitions, and the way that it executes is actually similar to:

SELECT MyUdf(x) FROM T_part1 UNION ALL
SELECT MyUdf(x) FROM T_part2 UNION ALL
SELECT MyUdf(x) FROM T_part3 UNION ALL
SELECT MyUdf(x) FROM T_part4

Here there are four separate instances of MyUdf, and each one sees just a subset of the rows from T as a whole.

Going back to your example, you're trying to emulate a user-defined aggregate function, where a particular instance of the UDF sees every row. The way to guarantee this would be to aggregate in advance, e.g.:

CREATE OR replace function MAP_COUNT(colValue array)
returns OBJECT 
language java
handler='Frequency.calculate'
target_path='@~/Frequency.jar'
as
$$
    import java.util.HashMap;
    import java.util.Map;
    import java.util.Optional;
    class Frequency {
        public Map<String, Integer> calculate(String[] colValues) {
            Map<String, Integer> frequencies = new HashMap<>();
            for (String colValue : colValues) {
                frequencies.putIfAbsent(colValue, 0);
                frequencies.computeIfPresent(colValue, (key, value) -> value + 1);
            }
            return frequencies;
        }
    }
$$;

(Note that I changed the UDF and method signatures to use array and String[], respectively.) Now use it in a query:

with temp_1 as
(
    SELECT 'John' AS my_col, 27 as age
    UNION ALL
    SELECT 'John' AS my_col, 28 as age
    UNION ALL
    SELECT 'doe' AS my_col, 27 as age
    UNION ALL
    SELECT 'doe' AS my_col, 28 as age
)
select
  MAP_COUNT(ARRAY_AGG(a.my_col)) as names,
  MAP_COUNT(ARRAY_AGG(a.age)) as ages
from temp_1 a;

This gives me:

names                        ages
{ "John": "2", "doe": "2" }  { "27": "2", "28": "2" }

There are still two problems here, notably:

This doesn't scale very well. If the size of either array exceeds 16MB (the maximum value size), the query will fail.
The syntax is clunky. Ideally, you'd just use the UDF like any other aggregate function rather than having to wrap your inputs in ARRAY_AGG.

The good news is that both of these problems will be addressed once Java UDAFs are available at some point in the future.

Upvotes: 2

Create UDAF(not UDTF) in Snowflake

Answers (2)

Related Questions