Shubham Patil
Shubham Patil

Reputation: 148

How to use external library in python UDF on hive?

I am want to transform a hive table (hdfs spot instances) using a Python UDF for which I need an external library "user-agents". My udf without the use of external library is working fine. But I am not able to get things working when I want to use it.

I tried installing the library using the code itself given below.

import sys
import subprocess
import pip
import os



sys.stdout = open(os.devnull, 'w+')
pip.main(['install', '--user', 'pyyaml'])
pip.main(['install', '--user', 'ua-parser'])
pip.main(['install', '--user', 'user-agents'])
sys.stdout = sys.__stdout__

and after this I tried this

import user_agents

but the udf is crashing with an exception "No module found". I also tried checking the following paths through code :

/usr/local/lib/python2.7/site-packages
/usr/local/lib64/python2.7/site-packages

But no user_agents module was there. Any help on how to do it to get things working ? Would really appreciate it. Thanks !

Upvotes: 1

Views: 1148

Answers (1)

Shubham Patil
Shubham Patil

Reputation: 148

I figured a way out of this. For those who are solving this same UDF issue and are not successful yet can possibly try this solution and check if it works for them too.

For external libraries, do the following steps:

Step 1: Force pip to install the external library through code itself to the current working directory of your UDF.

import sys
import os
import pip

sys.stdout = open(os.devnull, 'w+')
pip.main(['install', 'user-agents', '-t', os.getcwd(), '--ignore-installed'])
sys.stdout = sys.__stdout__

Step 2: Update your sys.path

sys.path.append(os.getcwd())

Step 3: Now import the library :)

from user_agents import parse 

That's it. Please check and confirm it this works for you too.

Upvotes: 2

Related Questions