Alexis Eggermont
Alexis Eggermont

Reputation: 8145

merging output of Spark into one file

I understand that my question is similar to Merge Output files after reduce phase, however I think it may be different because I am using Spark only a local machine and not actually a distributed file system.

I have Spark installed on a single VM (for testing). The output is given in several files (part-000000, part-000001, etc...) in a folder called 'STjoin' in Home/Spark_Hadoop/spark-1.1.1-bin-cdh4/.

The command hadoop fs -getmerge /Spark_Hadoop/spark-1.1.1-bin-cdh4/STjoin /desired/local/output/file.txt does not seem to work ("No such file or director")

Is this because this command only applies to files stored in HDFS and not locally, or am I not understanding linux addresses in general? (I am new to both linux and HDFS)

Upvotes: 2

Views: 2936

Answers (1)

frb
frb

Reputation: 3798

Simply do cat /path/to/source/dir/* > /path/to/output/file.txt. getmerge is the Hadoop version for HDFS-only files.

Upvotes: 4

Related Questions