Reputation: 99
I am a newbie in Hadoop & python and facing some issue. Appreciate your help...
I have got a file of say 150 records(just a sample) with 10 columns each which was loaded into a Hive table (table1).The column no. 10 (let's call it col10) is utf-8 encoded, so to decode it, I have written a small Python function(named as pyfile.py) which is as follows:
Python function:
import sys
import urllib
for line in sys.stdin:
line = line.strip()
col10 = urllib.unquote(line).decode('utf8')
print ''.join(col10.replace("+",' '))
I added the file in distributed cache using the following command:
add FILE folder1/pyfile.py;
Now, I am calling this Python function on col10 of my hive table using Transform as follows:
Select Transform(col10)
USING 'python pyfile.py'
AS (col10)
From table1;
Issue faced:
The issue is when call it on first 100 records of the table, it works perfectly fine, but fails for 101-150 records with the following error:
2015-10-30 00:58:20,320 INFO [IPC Server handler 0 on 33716] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1445826741287_0032_m_000000_0: Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:217)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying to close the Operator running your custom script.
at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:557)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
... 8 more
I copied 101-150 records to a text file,ran the python script separately on them and found out it to be running fine.
Please let me know the solution as to why it is throwing error.
Upvotes: 4
Views: 5288
Reputation: 36545
The error message you are seeing means that Python is throwing some exception. One thing that has worked for me in debugging this kind of thing has been to use the following pattern in my UDF code (see also my blog post about this):
import sys
import urllib
try:
for line in sys.stdin:
line = line.strip()
col10 = urllib.unquote(line).decode('utf8')
print ''.join(col10.replace("+",' '))
except:
#In case of an exception, write the stack trace to stdout so that we
#can see it in Hive, in the results of the UDF call.
print sys.exc_info()
Upvotes: 1