Reputation: 8145
I understand that my question is similar to Merge Output files after reduce phase, however I think it may be different because I am using Spark only a local machine and not actually a distributed file system.
I have Spark installed on a single VM (for testing). The output is given in several files (part-000000, part-000001, etc...) in a folder called 'STjoin' in Home/Spark_Hadoop/spark-1.1.1-bin-cdh4/.
The command hadoop fs -getmerge /Spark_Hadoop/spark-1.1.1-bin-cdh4/STjoin /desired/local/output/file.txt
does not seem to work ("No such file or director")
Is this because this command only applies to files stored in HDFS and not locally, or am I not understanding linux addresses in general? (I am new to both linux and HDFS)
Upvotes: 2
Views: 2936
Reputation: 3798
Simply do cat /path/to/source/dir/* > /path/to/output/file.txt
. getmerge
is the Hadoop version for HDFS-only files.
Upvotes: 4