Ravikumar Reddy Yeruva
Ravikumar Reddy Yeruva

Reputation: 117

How to add remarks column using scala

I have dataFrame as below and want to add remarks using Scala

id    val  visits 
111   2        1
111   2        1
112   4        2
112   5        4
113   6        1

Expected output should be below

id    val    visits   remarks
111   2        1      Ramdom
111   2        1      Ramdom
112   4        2      Less visit
112   5        4      More visit
113   6        1      One visit

Remarks should be:
Ramdom for Id has two records with same value & visits
One Visit for Id has only one record which contains any no of visits
Less Visit for Id has two records with less visits compared to other
More Visit for Id has more than one record with different value and visits.

Upvotes: 0

Views: 70

Answers (1)

SCouto
SCouto

Reputation: 7928

May not be the best solution but it's a working one:

First group your dataFrame by val and visits and the count of them

val grouped = df.groupBy("id").agg(max($"val").as("maxVal"), max($"visits").as("maxVisits"), min($"val").as("minVal"), min($"visits").as("minVisits"), count($"id").as("count"))

Then define a UDF where you implement your logic:

val remarks = functions.udf ((value: Int, visits: Int, maxValue: Int, maxVisits: Int, minValue: Int, minVisits: Int, count: Int) =>
   if (count == 1) {
     "One Visit"
   }else if (value == maxValue && value == minValue && visits == maxVisits && visits == minVisits) {
     "Random"
   }else {
     if (visits < maxVisits) {
       "Less Visits"
     }else {
       "More Visits"
     }
   }
 )

Then join the original dataFrame and the grouped one by id and add the desired column with the UDF. Finally remove the undesired columns from the output:

 df.join(grouped, Seq("id"))
   .withColumn("remarks", remarks($"val", $"visits", $"maxVal", $"maxVisits", $"minVal", $"minVisits", $"count"))
   .drop("maxVal","maxVisits", "minVal", "minVisits", "count")

Output:

+---+----+-------+-----------+
| id| val| visits|    remarks|
+---+----+-------+-----------+
|112|   4|      2|Less Visits|
|112|   5|      4|More Visits|
|113|   6|      1|  One Visit|
|111|   2|      1|     Random|
|111|   2|      1|     Random|
+---+----+-------+-----------+

P.S. remember to import functions

import org.apache.spark.sql.functions

Upvotes: 2

Related Questions