sparkscala
sparkscala

Reputation: 71

spark dataframe filter and select

I have a spark scala dataframe and need to filter the elements based on condition and select the count.

  val filter = df.groupBy("user").count().alias("cnt")
  val **count** = filter.filter(col("user") === ("subscriber").select("cnt")

The error i am facing is value select is not a member of org.apache.spark.sql.Column Also for some reasons count is Dataset[Row] Any thoughts to get the count in a single line?

Upvotes: 1

Views: 4272

Answers (2)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29165

DataSet[Row] is DataFrame

RDD[Row] is DataFrame so no need to worry.. its dataframe

see this for better understanding... Difference between DataFrame, Dataset, and RDD in Spark

Regarding select is not a member of org.apache.spark.sql.Column its purely compile error.

 val filter = df.groupBy("user").count().alias("cnt")
  val count = filter.filter (col("user") === ("subscriber"))
    .select("cnt")

will work since you are missing ) braces which is closing brace for filter.

Upvotes: 2

s.polam
s.polam

Reputation: 10372

You are missing ")" before .select, Please check below code.

Column class don't have .select method, you have to invoke select on Dataframe.

val filter = df.groupBy("user").count().alias("cnt")
  val **count** = filter.filter(col("user") === "subscriber").select("cnt")

Upvotes: 1

Related Questions