JDraper
JDraper

Reputation: 359

Mapping List items to org.apache.spark.sql.Column type

I am trying to sum a list of columns in my Dataframe of type org.apache.spark.sql.DataFrame and create a new column 'sums' and dataframe 'out'.

I can do this quite easily if I list the columns by hand, for example, this works

val columnsToSum = List(col("led zeppelin"), col("lenny kravitz"), col("leona lewis"), col("lily allen"))
val out = df3.withColumn("sums", columnsToSum.reduce(_ + _))

However, if I wish to do this by pulling the column names directly from the dataframes the items in the list object are not the same and I am unable to do this, for example

val columnsToSum = df2.schema.fields.filter(f => f.dataType.isInstanceOf[StringType]).map(_.name).patch(0, Nil, 1).toList // arrays are mutable (remove "user" from list)
println(tmpArr)
>> List(a perfect circle, abba, ac/dc, adam green, aerosmith, afi, ...

// Trying the same method
val out = df3.withColumn("sums", columnsToSum.reduce(_ + _))

>> found   : String
 required: org.apache.spark.sql.Column
val out = df3.withColumn("sums", tmpArr.reduce(_ + _))found   : String
 required: org.apache.spark.sql.Column
val out = df3.withColumn("sums", tmpArr.reduce(_ + _))

How do I do this type of conversion? I've tried:

List(a perfect circle, abba, ac/dc, ...).map(_.Column)
List(a perfect circle, abba, ac/dc, ...).map(_.spark.sql.Column)
List(a perfect circle, abba, ac/dc, ...).map(_.org.apache.spark.sql.Column)

Which haven't worked Thanks in advance

Upvotes: 2

Views: 354

Answers (1)

Krzysztof Atłasik
Krzysztof Atłasik

Reputation: 22635

You can get a column object from a string by using function col (you are actually already using it in your first snippet).

So this should work:

columnsToSum.map(col).reduce(_ + _)

or move verbose version:

columnsToSum.map(c => col(c)).reduce(_ + _)

Upvotes: 2

Related Questions