Reputation: 17585
Lets say I am importing a flat file from HDFS into spark using something like the following:
val data = sc.textFile("hdfs://name_of_file.tsv").map(_.split('\t'))
This will produce an Array[Array[String]]
. If I wanted an array of tuples I could do as referenced in this solution and map the elements to a tuple.
val dataToTuple = data.map{ case Array(x,y) => (x,y) }
But what if my input data has say, 100 columns? Is there a way in scala using some sort of wildcard to say
val dataToTuple = data.map{ case Array(x,y, ... ) => (x,y, ...) }
without having to write out 100 variable to match on?
I tried doing something like
val dataToTuple = data.map{ case Array(_) => (_) }
but that didn't seem to make much sense.
Upvotes: 0
Views: 2370
Reputation: 22374
If your data-columns are homogenous (like Array
of String
s) - tuple may not be a best solution to improve type-safety. All you can do is to fix the size of your array using sized list from Shapeless library:
How to require typesafe constant-size array in scala?
This is a right approach if your column's are unnamed. For instance, your row might be a representation of a vector in Euclidean space.
Otherwise (named columns, maybe different types), it's better to model it with a case class, but be aware of size restriction. This might help you to quickly map array (or its parts) to ADT: https://stackoverflow.com/a/19901310/1809978
Upvotes: 1