Jeremiah
Jeremiah

Reputation: 324

Load multiple files from multiple directory into Pig

Hello I have a directory with sub-directory similar to this a1,a2,..a8. and each of this directory has multiple files like

  bat-a1-0-0
  bat-a1-0-1
  bat-a1-1-0
  bat-a1-1-1
  ...
  bat-a1-31-0
  bat-a1-31-1

and for sub-directory a2 its similar

bat-a2-0-0
bat-a2-0-1
bat-a2-1-0
bat-a2-1-1
...
bat-a2-31-0
bat-a2-31-1

What I decide to do in order not to complicate things is to have multiple LOAD statement to load each directory and find a way to UNION to get all. But I do not know how to load the files in each of the directory using Apache Pig version 0.10.0-cdh4.2.1 since they seem not to follow a simple pattern. Need helps thanks.

Upvotes: 2

Views: 383

Answers (1)

Dennis Jaheruddin
Dennis Jaheruddin

Reputation: 21561

In fact this may be simpler than you think. If you load in files in pig, you can simply point to a directory, and pig will recursively load all files. Even those which may be deeply nested.

So the solution is: Make sure all your data is under 1 (or a few) directories, and load them in.

Upvotes: 1

Related Questions