Matthew Rathbone
Matthew Rathbone

Reputation: 8259

Does hive instantiate a new UDF object for each record?

Say I'm building a UDF class called StaticLookupUDF that has to load some static data from a local file during construction.

In this case I want to ensure that I'm not replicating work more than I need to be, in that I don't want to re-load the static data on every call to the evaluate() method.

Clearly each mapper uses it's own instantiation of the UDF, but does a new instance get generated for each record processed?

For example, a mapper is going to process 3 rows. Does it create a single StaticLookupUDF and call evaluate() 3 times, or does it create a new StaticLookupUDF for each record, and call evaluate only once per instance?

If the second example is true, in what alternate way should I structure this?

Couldn't find this anywhere in the docs, I'm going to look through the code, but figured I'd ask the smart people here at the same time.

Upvotes: 5

Views: 1799

Answers (1)

Matthew Rathbone
Matthew Rathbone

Reputation: 8259

Still not totally sure about this, but I got around it by having a static lazy value that loaded data as needed.

This way you have one-instance of the static value per mapper. So if you're reading in a dataset and you have 6 map tasks you'll read in the data 6 times. Not ideal, but better than once per record.

Upvotes: 2

Related Questions