Implementation of Apache Pig and Hadoop

Question

I have learnt that Pig is built on top of Apache Hadoop. But I am not able to find the extra features that a bare hadoop implementation lacks that Pig has. What caused the need for a language like Pig Latin? What was lacking in Hadoop?

Amar · Accepted Answer

Quoting from wiki:

Pig is a high-level platform for creating MapReduce programs used with Hadoop.The language for this platform is called Pig Latin. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems. Pig Latin can be extended using UDF (User Defined Functions) which the user can write in Java, Python or JavaScript and then call directly from the language.

Now, the keywords above are high-level and abstracts . The way we have DBAs who can create/manage databases without knowledge of any major programming language but for SQL, similarly we can have data-engineers creating/managing data-pipelines/warehouses using Pig without getting into the complexities of how/what is being implemented/executed as hadoop jobs. So, to answer your question, Pig is not there to complement Hadoop in any of the features it is lacking, but it is just a high-level framework built on top hadoop to do things faster(development time).

You can certainly do everything what Pig does with Hadoop, but try out some advanced features of Pig and writing hadoop jobs for them will take some real good time. So, speaking very liberally, some of the tasks which are generic/common throughout data engineering have been implemented in bare hadoop before-hand in form of Pig, you just need to tell it in Pig-Latin to be executed.

Implementation of Apache Pig and Hadoop

Answers (1)

Related Questions