Reputation: 153
I have a file which is tab separated. The third column should be my key and the entire record should be my value (as per Map reduce concept).
val cefFile = sc.textFile("C:\\text1.txt")
val cefDim1 = cefFile.filter { line => line.startsWith("1") }
val joinedRDD = cefFile.map(x => x.split("\\t"))
joinedRDD.first().foreach { println }
I am able to get the value of first column but not third. Can anyone suggest me how I could accomplish this?
Upvotes: 2
Views: 7522
Reputation: 7452
After you've done the split x.split("\\t")
your rdd (which in your example you called joinedRDD
but I'm going to call it parsedRDD
since we haven't joined it with anything yet) is going to be an RDD of Arrays. We could turn this into an array of key/value tuples by doing parsedRDD.map(r => (r(2), r))
. That being said - you aren't limited to just map & reduce operations in Spark so its possible that another data structure might be better suited. Also for tab separated files, you could use spark-csv along with Spark DataFrames if that is a good fit for the eventual problem you are looking to solve.
Upvotes: 2