Reputation: 10139
Write a Python UDF for Hadoop/Pig, and need to use some Python libraries like "request" which I installed locally by pip when doing local box UDF testing. Wondering how to deploy the pip package on Hadoop cluster so that no matter my Python UDF runs on which node, it automatically consumes?
Upvotes: 1
Views: 1361
Reputation:
Information about zip file format can be found at Zip (file format). Practically speaking it is a compressed archive format sort of like tar (an archive format) plus gzip (a file compression format). Java jar (Java ARchive) format is compatible with zip.
On Linux and Unix platforms a directory dir can be zipped with 'zip -r dir dir' to create a dir.zip file. On Windows 7-Zip is most useful for creating and unbundling zip files plus it can be used to unbundle and browse files having other compression and archive formats including tar and gzip.
Given a file dir.tar.gz it can be unbundled and zipped interactively using the 7-Zip GUI on Windows while on Linux and Unix systems the following commands can do the same thing:
tar zxf dir.tar.gz # creates directory dir by extraction and decompression
zip -r dir dir # creates dir.zip by bundling without removing dir
Upvotes: 1