Reputation: 11017
I have written a Hive UDF that does decryption using an in-house API as follows:
public Text evaluate(String customer) {
String result = new String();
if (customer == null) { return null; }
try {
result = com.voltage.data.access.Data.decrypt(customer.toString(), "name");
} catch (Exception e) {
return new Text(e.getMessage());
}
return new Text(result);
}
and Data.decrypt does:
public static String decrypt(String data, String type) throws Exception {
configure();
String FORMAT = new String();
if (type.equals("ccn")) {
FORMAT = "CC";
} else if (type.equals("ssn")) {
FORMAT = "SSN";
} else if (type.equals("name")) {
FORMAT = "AlphaNumeric";
}
return library.FPEAccess(identity, LibraryContext.getFPE_FORMAT_CUSTOM(),String.format("formatName=%s", FORMAT),authMethod, authInfo, data);
}
where configure() creates a pretty expensive context object.
My question is: Does Hive execute this UDF once for every row returned by the query? i.e. If I'm selecting 10,000 rows, does the evaluate method get run 10,000 times?
My gut instinct tells me yes. And if so, then here's a second question:
Is there any way I can do one of the following:
a) run configure() once when the query first starts, then share the context object
b) instead of the UDF returning a decrypted string, it aggregates the encrypted string into some Set, then I do a bulk decrypt on the set?
Thanks in advance
Upvotes: 2
Views: 1703
Reputation: 18424
Is configure()
something that needs to be called once per JVM, or once per instance of the UDF class?
If once per JVM, just put it in a static block in the class, like so:
static {
configure();
}
If once per instance, put it in the constructor:
public [class name]() {
super();
configure();
}
Upvotes: 2