mkersten
mkersten

Reputation: 714

Handling Python UDFs in MonetDB

The blending of Python and SQL in MonetDB is a great start to move a lot of the business logic into the database server. However, the current documentation contains some hurdles to take by a novice in this game. Consider the following function:

sql>select * from getsourceattributes('tables');
+---------------+
| c             |
+===============+
| id            |
| name          |
| schema_id     |
| query         |
| type          |
| system        |
| commit_action |
| access        |
| temporary     |
+---------------+

and the following table to gather some stats:

create table dummy(tbl string, col string, stat integer);

Now knowing that ptbl is an numpy array, i tried the following:

create function gatherStatistics(ptbl string)
returns string
language python {
     for p in ptbl:
         attr = _conn.execute("select * from getSourceAttributes('"+ str(p) +"');")
         for col in attr :
             stat = _conn.execute("select count(*) from "+ str(p) +";")
             _conn_execute("insert into dummy values('"+ str(p)+"','"+ str(col) +"',"+ str(stat)+");")
     return ptbl;
};

and calling with

select gatherstatistics('tables');
SELECT: no such table 't'
Python exception
   3.     attr = _conn.execute("select * from getSourceAttributes('"+ str(p) +"');")
   4.     for col in attr :

     5.       stat = _conn.execute("select count(*) from "+ str(p) +";")

   6.       _conn_execute("insert into dummy values('"+ str(p)+"','"+ str(col) +"',"+str(stat)+");")
   7.   return ptbl;
SQL Query Failed: ParseException:SQLparser:42S02!SELECT: no such table 't'

Upvotes: 0

Views: 384

Answers (1)

Mytherin
Mytherin

Reputation: 151

The problem you have is that you treat a scalar string as an array. You call SELECT gatherstatistics('tables'); and iterate over the values of ptbl. In this case, you are iterating over the characters of the string 'tables', the first of which is the character 't'.

The variables in Python UDFs are converted to either (1) NumPy arrays, if the function is called with a column as input, or (2) Scalar values, if the function is called with a scalar value as input.

Consider the following function, that returns the type of its input parameters as a string:

CREATE FUNCTION get_type(s STRING)
RETURNS STRING
LANGUAGE PYTHON
{
    return str(type(s))
};

Now if we query it using a scalar value we get the following output:

SELECT get_type('hello');
+------------------+
| L2               |
+==================+
| <type 'unicode'> |
+------------------+

But if we query it with a column we get the following output:

SELECT get_type(name) FROM tables;
+------------------------+
| L40                    |
+========================+
| <type 'numpy.ndarray'> |
+------------------------+

Generally you would create a function that accepts only one as valid input for a given parameter. For example, you could create a function called log that takes as input an array of numeric values, and a scalar base for the logarithm.

If you want to create a function that operates on both scalar values and array values, a simple way of doing so is checking the type of the input parameters. If it is not a NumPy array, we convert it to one.

For example, consider the following function that reverses a string and operates on both arrays and scalar values:

CREATE OR REPLACE FUNCTION reverse_string(s STRING)
RETURNS STRING
LANGUAGE PYTHON
{
    if not isinstance(s, numpy.ndarray):
        s = numpy.array([s])
    return [x[::-1] for x in s]
};
SELECT reverse_string('hello');
+-------+
| L2    |
+=======+
| olleh |
+-------+
SELECT reverse_string(name) FROM tables LIMIT 3;
+-----------+
| L40       |
+===========+
| samehcs   |
| sepyt     |
| snoitcnuf |
+-----------+

Upvotes: 1

Related Questions