Reputation: 287
I am trying to define a recursive function (to find the Levenshtein distance between two strings). I have defined a dataset by prepending its name to the function's name.
As a possible fix, I tried removing the SQUAD_V11 prefix from the recursive calls.
CREATE OR REPLACE FUNCTION SQUAD_V11.levenshteinDistance(s STRING,t STRING)
RETURNS INT64
AS
(
/* function levenshteinDistance (s, t) { */
/* if (!s.length) return t.length*/
CASE
WHEN (LENGTH(s) = 0) THEN LENGTH(t)
/* if (!t.length) return s.length; */
WHEN (LENGTH(t) = 0) THEN LENGTH(s)
/* return Math.min(*/
ELSE
(SELECT MIN(distances) FROM UNNEST(
/* levenshteinDistance(s.substr(1), t) + 1,*/
[SQUAD_V11.levenshteinDistance(SUBSTR(s,1),t) + 1,
/* levenshteinDistance(t.substr(1), s) + 1,*/
SQUAD_V11.levenshteinDistance(SUBSTR(t,1),s) + 1,
/* levenshteinDistance(s.substr(1), t.substr(1)) + (s[0] !== t[0] ? 1 : 0)*/
SQUAD_V11.levenshteinDistance(SUBSTR(s,1),SUBSTR(t,1)) + (SUBSTR(s,0,1) != SUBSTR(t,0,1))]))
/* ) + 1;*/
END
/* }*/
);
I need to be able to call my function recursively. I should be able to call the function from the function; logically, this should not require different syntax than if I were to call it from a SELECT statement. However, when I try to do this, I get, "Error: Function not found: levenshteinDistance at [15:8]."
Upvotes: 0
Views: 286
Reputation: 173046
Unfortunately BigQuery does not support recursive UDF calls! But you can implement whole recursive logic inside one JS UDF (note: not SQL UDF) and then call it from your SQL
Below is example of such implementation
#standardSQL
CREATE OR REPLACE FUNCTION SQUAD_V11.levenshteinDistance(s STRING, t STRING)
RETURNS INT64
LANGUAGE js AS """
var _extend = function(dst) {
var sources = Array.prototype.slice.call(arguments, 1);
for (var i=0; i<sources.length; ++i) {
var src = sources[i];
for (var p in src) {
if (src.hasOwnProperty(p)) dst[p] = src[p];
}
}
return dst;
};
var Levenshtein = {
/**
* Calculate levenshtein distance of the two strings.
*
* @param str1 String the first string.
* @param str2 String the second string.
* @return Integer the levenshtein distance (0 and above).
*/
get: function(str1, str2) {
// base cases
if (str1 === str2) return 0;
if (str1.length === 0) return str2.length;
if (str2.length === 0) return str1.length;
// two rows
var prevRow = new Array(str2.length + 1),
curCol, nextCol, i, j, tmp;
// initialise previous row
for (i=0; i<prevRow.length; ++i) {
prevRow[i] = i;
}
// calculate current row distance from previous row
for (i=0; i<str1.length; ++i) {
nextCol = i + 1;
for (j=0; j<str2.length; ++j) {
curCol = nextCol;
// substution
nextCol = prevRow[j] + ( (str1.charAt(i) === str2.charAt(j)) ? 0 : 1 );
// insertion
tmp = curCol + 1;
if (nextCol > tmp) {
nextCol = tmp;
}
// deletion
tmp = prevRow[j + 1] + 1;
if (nextCol > tmp) {
nextCol = tmp;
}
// copy current col value into previous (in preparation for next iteration)
prevRow[j] = curCol;
}
// copy last col value into previous (in preparation for next iteration)
prevRow[j] = nextCol;
}
return nextCol;
}
};
var the_s;
try {
the_s = decodeURI(s).toLowerCase();
} catch (ex) {
the_s = s.toLowerCase();
}
try {
the_t = decodeURI(t).toLowerCase();
} catch (ex) {
the_t = t.toLowerCase();
}
return Levenshtein.get(the_s, the_t)
""";
If to apply to dummy data as in below example
WITH `project.dataset.table` AS (
SELECT 'mikhail' string1, 'mike' string2 UNION ALL
SELECT 'jon', 'john'
)
SELECT string1, string2, SQUAD_V11.levenshteinDistance(string1, string2) changes
FROM `project.dataset.table`
result will be
Row string1 string2 changes
1 mikhail mike 4
2 jon john 1
Upvotes: 2