Reputation: 626
I need to create a DataFrame from existing DataFrame in which I need to change the schema as well.
I have a DataFrame like:
+-----------+----------+-------------+
|Id |Position |playerName |
+-----------+-----------+------------+
|10125 |Forward |Messi |
|10126 |Forward |Ronaldo |
|10127 |Midfield |Xavi |
|10128 |Midfield |Neymar |
and I am created this using a case class given below:
case class caseClass (
Id: Int = "",
Position : String = "" ,
playerName : String = ""
)
Now I need to make both Playername and position under Struct type.
ie,
I need to create another DataFrame with schema,
root
|-- Id: int (nullable = true)
|-- playerDetails: struct (nullable = true)
| |--playername: string (nullable = true)
| |--Position: string (nullable = true)
I did the following code to create a new dataframe by referring the link https://medium.com/@mrpowers/adding-structtype-columns-to-spark-dataframes-b44125409803
myschema was
List(
StructField("Id", IntegerType, true),
StructField("Position",StringType, true),
StructField("playerName", StringType,true)
)
I tried the following code
spark.sparkContext.parallelize(data),
myschema
)
but I can't make it happen.
I saw similar question Change schema of existing dataframe but I can't understand the solution.
Is there any solution for directly implement StructType inside the case class? so that I think I don't need to make own schema for creating struct type values.
Upvotes: 5
Views: 3326
Reputation: 7207
Function "struct" can be used:
// data
val playersDF = Seq(
(10125, "Forward", "Messi"),
(10126, "Forward", "Ronaldo"),
(10127, "Midfield", "Xavi"),
(10128, "Midfield", "Neymar")
).toDF("Id", "Position", "playerName")
// action
val playersStructuredDF = playersDF.select($"Id", struct("playerName", "Position").as("playerDetails"))
// display
playersStructuredDF.printSchema()
playersStructuredDF.show(false)
Output:
root
|-- Id: integer (nullable = false)
|-- playerDetails: struct (nullable = false)
| |-- playerName: string (nullable = true)
| |-- Position: string (nullable = true)
+-----+------------------+
|Id |playerDetails |
+-----+------------------+
|10125|[Messi, Forward] |
|10126|[Ronaldo, Forward]|
|10127|[Xavi, Midfield] |
|10128|[Neymar, Midfield]|
+-----+------------------+
Upvotes: 4