user2257622
user2257622

Reputation: 21

Access distributed cache from MrJob

I am writing hadoop app using MrJob. I need to use distributed cache to access to some files. I know that there is an option -files in hadoop streaming but don't know how to access it in the program.

Thanks for your help.

Upvotes: 2

Views: 805

Answers (2)

Manish Verma
Manish Verma

Reputation: 781

I think You have to use

mrjob.compat.supports_new_distributed_cache_options(version)

And then use -files and -archives instead of -cacheFile and -cacheArchive

May be you will get more here

Upvotes: 2

Amar
Amar

Reputation: 12010

You shall read files in your program as though the files are available there itself, i.e. the file is local in the same directory as the running code.

I am not good in python, hence here is the example in ruby, mapper.rb:

begin
    file = File.open("my-distributed-cache-file.txt")
    while (line = file.gets)
            # do something with your file
    end
    file.close
end
# Rest of mapper code

Upvotes: -1

Related Questions