user3476463
user3476463

Reputation: 4575

Hive UDF with Python

I'm new to python, pandas, and hive and would definitely appreciate some tips.

I have the python code below, which I would like to turn into a UDF in hive. Only instead of taking a csv as the input, doing the transformations and then exporting another csv, I would like to take a hive table as the input, and then export the results as a new hive table containing the transformed data.

Python Code:

import pandas as pd
data = pd.read_csv('Input.csv')
df = data
df = df.set_index(['Field1','Field2'])
Dummies=pd.get_dummies(df['Field3']).reset_index()
df2=Dummies.drop_duplicates()
df3=df2.groupby(['Field1','Field2']).sum()
df3.to_csv('Output.csv')

Upvotes: 4

Views: 14119

Answers (1)

visakh
visakh

Reputation: 2553

You can make use of the TRANSFORM function to make use of a UDF written in Python. The detailed steps are outlined here and here.

Upvotes: 11

Related Questions