ankushb
ankushb

Reputation: 198

Hadoop Input files Order

I have data files arranged in folders named as dates. Directory structure

and so on and inside each directory there are around 50 files I need to parsed and I am giving input to hadoop as /data/** /** /** so that It can parse all the files. My questions are

  1. How can I ask hadoop to order the input. I need to parse the files date by date.
  2. While parsing files of particular date, I need to pre load a datastructure associated with that date and is in the same date directory.

Thanks Ankush

Upvotes: 1

Views: 574

Answers (1)

Niels Basjes
Niels Basjes

Reputation: 10652

  1. You can't order the input. In a "worst case" scenario if you have the same number of input files as you have running tasks in a cluster they will all be processed at the same moment in parallel.
  2. Perhaps you can create a custom implementation of "FileInputFormat" that reads the required config file and does what you need?

Upvotes: 1

Related Questions