Writing a UDF in Python using Pandas throwing error

Question

We are trying to write UDFs of Hive in Python to clean the data. The UDF we tried was using Pandas and it is throwing the error.

When we try using another python code without the Pandas it is working fine. Kindly help to understand the problem. Providing Pandas code below:

We have already tried various ways of Pandas but unfortunately no luck. As the other Python code without Pandas is working fine,we are confused why is it failing?

import sys
import pandas as pd
import numpy as np
for line in sys.stdin:
    df = line.split('	')
    df1 = pd.DataFrame(df)
    df2=df1.T
    df2[0] = np.where(df2[0].str.isalpha(), df2[0], np.nan)
    df2[1] = np.where(df2[1].astype(str).str.isdigit(), df2[1], np.nan)
    df2[2] = np.where(df2[2].astype(str).str.len() != 10, np.nan, 
    df2[2].astype(str))
    #df2[3] = np.where(df2[3].astype(str).str.isdigit(), df2[3], np.nan)
    df2 = df2.dropna()
    print(df2)

I get this error:

FAILED: Execution Error, return code 20003 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. An error occurred when trying to close the Operator running your custom script.
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

Writing a UDF in Python using Pandas throwing error

Answers (1)

Related Questions