Reputation: 135
Hello guys i have this function that gets the row Values from a DataFrame, converts them into a list and the makes a Dataframe from it.
//Gets the row content from the "content column"
val dfList = df.select("content").rdd.map(r => r(0).toString).collect.toList
val dataSet = sparkSession.createDataset(dfList)
//Makes a new DataFrame
sparkSession.read.json(dataSet)
What i need to do to make a list with other column values so i can have another DataFrame with the other columns values
val dfList = df.select("content","collection", "h").rdd.map(r => {
println("******ROW********")
println(r(0).toString)
println(r(1).toString)
println(r(2).toString) //These have the row values from the other
//columns in the select
}).collect.toList
thanks
Upvotes: 0
Views: 57
Reputation: 853
Approach doesn't look right, you don't need to collect dataframe to just add new columns. Try adding columns to directly to dataframe using withColumn() withColumnRenamed() https://docs.azuredatabricks.net/spark/1.6/sparkr/functions/withColumn.html.
If you want to bring columns from another dataframe try joining. In any case it's not good idea to use collect as it will bring all your data to driver.
Upvotes: 1