ziky90
ziky90

Reputation: 2697

installing PIG 0.14 on Amazon EMR

I need to run Python streaming UDFs from PIG on Amazon EMR using Hadoop 2.x

Based on the documentation PIG works with Hadoop 2.x since version 0.14 http://pig.apache.org/docs/r0.12.0/udf.html#python-udfs http://pig.apache.org/docs/r0.14.0/udf.html#python-udfs

I have personally tried that Python streaming UDFs does not work on 0.12 and based on the missing note in the documentation on 0.14 it seems to me that it should work in this version.

Seeing supported PIG versions in Amazon EMR documentation it seems to me that there is only supported PIG lower than 0.12 http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/Pig_SupportedVersions.html

So my question is if anyone have some idea or experience of how can be possibly "hacked" or deployed PIG 0.14 to the EMR cluster? And if there really works Python UDFs with Hadoop 2.x with PIG 0.14 (jut to know if the problematic PIG 0.14 installation would be worth of it)?

Upvotes: 1

Views: 237

Answers (1)

ziky90
ziky90

Reputation: 2697

So at the end I have solved this by simply downloading the pig 0.14 to all of the machines in the bootstrap script and overwriting the PIG_HOME by my pig 0.14 location in ~/.bashrc and it has worked for me. (At least for using pig 0.14 for when I'm connected to the master via ssh)

Upvotes: 0

Related Questions