Anay T
Anay T

Reputation: 56

What does hdfs dfs -getmerge command do?

As a result of an hive query I am getting more than one output files(did a distribute by sort by) and now I want to merge them to produce a single file. So I have tried hdfs dfs -getmerge command. Now I want to understand whether -getmerge sorts the files before concatenating or it just concatenates?

Upvotes: 2

Views: 2202

Answers (2)

sam
sam

Reputation: 143

Here is the documentation (for Hadoop 2.7.1): https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html#getmerge

Basically: 1 - Concatenate the files to one 2 - May insert a new line (-nl) between concatenated files.

Ex: $ hadoop fs -getmerge [-nl] src1 [ src2 [src3]]

Upvotes: 0

Sambit Tripathy
Sambit Tripathy

Reputation: 444

public static boolean More ...copyMerge(FileSystem srcFS, Path srcDir, 
277                                  FileSystem dstFS, Path dstFile, 
278                                  boolean deleteSource,
279                                  Configuration conf, String addString) throws IOException {
280    dstFile = checkDest(srcDir.getName(), dstFS, dstFile, false);
281
282    if (!srcFS.getFileStatus(srcDir).isDirectory())
283      return false;
284   
285    OutputStream out = dstFS.create(dstFile);
286    
287    try {
288      FileStatus contents[] = srcFS.listStatus(srcDir);
289      Arrays.sort(contents);
290      for (int i = 0; i < contents.length; i++) {
291        if (contents[i].isFile()) {
292          InputStream in = srcFS.open(contents[i].getPath());
293          try {
294            IOUtils.copyBytes(in, out, conf, false);
295            if (addString!=null)
296              out.write(addString.getBytes("UTF-8"));
297                
298          } finally {
299            in.close();
300          } 
301        }
302      }
303    } finally {
304      out.close();
305    }
306    
307
308    if (deleteSource) {
309      return srcFS.delete(srcDir, true);
310    } else {
311      return true;
312    }
313  }  

It sorts the file array(default ascending order), source hadoop 0.23

Upvotes: 4

Related Questions