Hive: python UDF gives "Hive Runtime Error while closing operators"

Question

I am a newbie in Hadoop & python and facing some issue. Appreciate your help...

I have got a file of say 150 records(just a sample) with 10 columns each which was loaded into a Hive table (table1).The column no. 10 (let's call it col10) is utf-8 encoded, so to decode it, I have written a small Python function(named as pyfile.py) which is as follows:

Python function:

import sys
import urllib
for line in sys.stdin:
    line = line.strip()
    col10 = urllib.unquote(line).decode('utf8')
    print ''.join(col10.replace("+",' '))

I added the file in distributed cache using the following command:

add FILE folder1/pyfile.py;

Now, I am calling this Python function on col10 of my hive table using Transform as follows:

Select Transform(col10)
USING 'python pyfile.py'
AS (col10)
From table1;

Issue faced:

The issue is when call it on first 100 records of the table, it works perfectly fine, but fails for 101-150 records with the following error:

2015-10-30 00:58:20,320 INFO [IPC Server handler 0 on 33716] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1445826741287_0032_m_000000_0: Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:217)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying to close the Operator running your custom script.
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:557)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
    ... 8 more

I copied 101-150 records to a text file,ran the python script separately on them and found out it to be running fine.

Please let me know the solution as to why it is throwing error.

Hive: python UDF gives "Hive Runtime Error while closing operators"

Answers (1)

Related Questions

Hive: python UDF gives &quot;Hive Runtime Error while closing operators&quot;

Answers (1)

Related Questions

Hive: python UDF gives "Hive Runtime Error while closing operators"