Reputation: 53
How to create multiple dataframe using same case class? Suppose I want to create multiple dataframe one with 5 column and another have 3 column how I am going to achieve that using a single case class?
Upvotes: 1
Views: 156
Reputation: 1590
You can't create two Dataframe using single case class with the same number of columns directly. Assume you have the below case class FlightData
. If you created a Dataframe from this case class it will contains 3 columns. However, you could create two Dataframe but in the next one you can select some column from this case class. If you have two different file and every file contains different structure you need to create two separated case class.
val someData = Seq(
Row("United States", "Romania", 15),
Row("United States", "Croatia", 1),
Row("United States", "Ireland", 344),
Row("Egypt", "United States", 15)
)
val flightDataSchema = List(
StructField("DEST_COUNTRY_NAME", StringType, true),
StructField("ORIGIN_COUNTRY_NAME", StringType, true),
StructField("count", IntegerType, true)
)
case class FlightData(DEST_COUNTRY_NAME: String, ORIGIN_COUNTRY_NAME: String, count: Int)
import spark.implicits._
val dataDS = spark.createDataFrame(
spark.sparkContext.parallelize(someData),
StructType(flightDataSchema)
).as[FlightData]
val dataDS_2 = spark.createDataFrame(
spark.sparkContext.parallelize(someData),
StructType(flightDataSchema)
).as[FlightData].select('DEST_COUNTRY_NAME)
Upvotes: 2