Foaad Mohamad Haddod
Foaad Mohamad Haddod

Reputation: 133

Spark: count percentage percentages of a column values

I am trying to improve my Spark Scala skills and I have this case which I cannot find a way to manipulate so please advise!

I have original data as it shown in the figure bellow:

enter image description here

I want to calculate the percentage of every result of the count column . E.g. the last error value is 64 how much is 64 as a percentage out of the all column values. Please note that I am reading the original data as Dataframes using sqlContext: Here is my code:

    val df1 = df.groupBy(" Code")
.agg(sum("count").alias("sum"), mean("count")
.multiply(100)
.cast("integer").alias("percentag‌​e")) 

I want results similar to this:

enter image description here

Thanks in advance!

Upvotes: 10

Views: 15488

Answers (1)

user8811088
user8811088

Reputation: 141

Use agg and window functions:

import org.apache.spark.sql.expressions._
import org.apache.spark.sql.functions._

df
  .groupBy("code")
  .agg(sum("count").alias("count"))
  .withColumn("fraction", col("count") /  sum("count").over())

Upvotes: 14

Related Questions