Reputation: 12846
I am trying to learn using Python UDF's with Hive.
I have a very basic python UDF here:
import sys
for line in sys.stdin:
line = line.strip()
print line
Then I add the file in Hive:
ADD FILE /home/hadoop/test2.py;
Now I call the Hive Query:
SELECT TRANSFORM (admission_type_id, description)
USING 'python test2.py'
FROM admission_type;
This works as expected, no changes is made to the field and the output is printed as is.
Now, when I modify the UDF by introducing the split function, I get an execution error. How do I debug here? and what am I doing wrong?
New UDF:
import sys
for line in sys.stdin:
line = line.strip()
fields = line.split('\t') # when this line is introduced, I get an execution error
print line
Upvotes: 0
Views: 3111
Reputation: 11
import sys
for line in sys.stdin:
line = line.strip()
field1, field2 = line.split('\t')
print '\t'.join([str(field1), str(field2)])
SELECT TRANSFORM (admission_type_id, description)
USING 'python test2.py' As ( admission_type_id_new, description_new)
FROM admission_type;
Upvotes: 1