divide spark rdd into 2 separate files based on certain keys

Question

I'm using Cloudera VM, a linux terminal and spark version 1.6.0

Let's say I have the following dataset:

Priority, qty, sales => I'm not importing headers.

low,6,261.54

high,44,1012

low,1,240

high,25,2500

I can load, "val inputFile = sc.textFile("file:///home/cloudera/stat.txt")

I can sort, "inputFile.sortBy(x=>x(1),true).collect

but I want to place low and high priority data into 2 separate files.

Would that be a filter or reduceby or partitioning? how best could I do that? If I can get help with that I think I might be able to wrap my head around creating an RDD of priority & sales, qty & sales.

divide spark rdd into 2 separate files based on certain keys

Answers (1)

Related Questions