Reputation: 1114
Given a list of strings, is there a way to create a case class or a Schema without inputing the srings manually.
For eaxample, I have a List,
val name_list=Seq("Bob", "Mike", "Tim")
The List will not always be the same. Sometimes it will contain different names and will vary in size.
I can create a case class
case class names(Bob:Integer, Mike:Integer, Time:Integer)
or a schema
val schema = StructType(StructFiel("Bob", IntegerType,true)::
StructFiel("Mike", IntegerType,true)::
StructFiel("Tim", IntegerType,true)::Nil)
but I have to do it manually. I am looking for a method to perform this operation dynamically.
Upvotes: 1
Views: 3945
Reputation: 543
All the answers above only covered one aspect which is create the schema. Here is one solution you can use to create the case class from the generated schema: https://gist.github.com/yoyama/ce83f688717719fc8ca145c3b3ff43fd
Upvotes: 0
Reputation: 23119
If you have all the fields with same datatype than you can simply create as
val name_list=Seq("Bob", "Mike", "Tim")
val fields = name_list.map(name => StructField(name, IntegerType, true))
val schema = StructType(fields)
If you have different datatype than create a map
of fields and type and create a schema
as above.
Hope this helps!
Upvotes: 0
Reputation: 22449
Assuming the data type of the columns are the same:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val nameList=Seq("Bob", "Mike", "Tim")
val schema = StructType(nameList.map(n => StructField(n, IntegerType, true)))
// schema: org.apache.spark.sql.types.StructType = StructType(
// StructField(Bob,IntegerType,true), StructField(Mike,IntegerType,true), StructField(Tim,IntegerType,true)
// )
spark.createDataFrame(rdd, schema)
If the data types are different, you'll have to provide them as well (in which case it might not save much time compared with assembling the schema manually):
val typeList = Array[DataType](StringType, IntegerType, DoubleType)
val colSpec = nameList zip typeList
val schema = StructType(colSpec.map(cs => StructField(cs._1, cs._2, true)))
// schema: org.apache.spark.sql.types.StructType = StructType(
// StructField(Bob,StringType,true), StructField(Mike,IntegerType,true), StructField(Tim,DoubleType,true)
// )
Upvotes: 3