John Deer
John Deer

Reputation: 35

Error Passing Relation to Python UDF in Pig

I am trying to pass relation to Python UDF in Pig. But it's throwing me an error. Following are my Pig Latin Script, Python Script, and error log,

REGISTER '/home/cloudera/jython-installer-2.7.0.jar';
REGISTER '/home/cloudera/Code.py' USING jython as myfunc;
A = LOAD '/home/cloudera/Link.txt' as (line:chararray);
B = FOREACH A GENERATE myfunc.codefunc(line);

//Python Script

import pandas as pd
def count(A,  crime):
    with open(A, 'r', encoding='UTF8') as fileA:
        data = fileA.read().lower()
        count = data.count(crime.lower())
        return count
def codefunc(A):
    crime = ['Rape', 'Murder', 'Extortion', 'Felony', 'Burglary', 'Property Damage', 'Arrest', 'Political Unrest', 'Civil Unrest', 'Solitication', 'Larceny', 'Abettor', 'Trafficking', 'Tresspasser', 'Robbery']
    crimecount = {}
    for i in range(len(crime)):
            crimecount[crime[i]] = count(A, crime[i])
    final_count = pd.DataFrame(list(crimecount.items()), columns = ['Crime', 'Value'])
    final_count['Percentage'] = 0
    total_count = final_count['Values'].sum()
    for i in range(0, final_count.last_valid_index()+1):
            final_count['Percentage'][i] = float((final_count['Values'][i]/total_count)*100.0)
    final_count.sort_values(by=['Percentage'], ascending=False)
    final_count.to_csv('/home/cloudera/solution.csv', header=0)

//Error Log Link of Error Log

I have placed the link where dataset resides, and I have passed the link from Pig to Python. Python should go to that link and read the dataset and execute the code written. Python Code is absolutely fine. I am confident on that. But Pig is throwing me an error at relation, 'B'. I tried placing the error code here, but Stack Overflow isn't letting me do it, so I have placed the link. Regret the inconvenience. Can anyone please help me. Thanks in advance.

Upvotes: 2

Views: 91

Answers (1)

D Sai Krishna
D Sai Krishna

Reputation: 178

Your code is absolutely fine. Problem is with Jython. Jython doesn't support Pandas dataframe because it was written in C/C++. So cheer up!

Hope you like my answer! Yippee!!

Upvotes: 1

Related Questions