Reputation: 21
I am writing hadoop app using MrJob. I need to use distributed cache to access to some files. I know that there is an option -files in hadoop streaming but don't know how to access it in the program.
Thanks for your help.
Upvotes: 2
Views: 805
Reputation: 781
I think You have to use
mrjob.compat.supports_new_distributed_cache_options(version)
And then use -files and -archives instead of -cacheFile and -cacheArchive
May be you will get more here
Upvotes: 2
Reputation: 12010
You shall read files in your program as though the files are available there itself, i.e. the file is local in the same directory as the running code.
I am not good in python, hence here is the example in ruby, mapper.rb
:
begin
file = File.open("my-distributed-cache-file.txt")
while (line = file.gets)
# do something with your file
end
file.close
end
# Rest of mapper code
Upvotes: -1