Ankita
Ankita

Reputation: 490

SPARK SQL : How to filter records by multiple colmuns and using groupBy too

//dataset
 michael,swimming,silve,2016,USA
 usha,running,silver,2014,IND
 lisa,javellin,gold,2014,USA
 michael,swimming,silver,2017,USA

Questions -- 1) How many silver medals have been won by the USA in each sport -- and the code throws the error value === is not the member of string

   val rdd = sc.textFile("/home/training/mydata/sports.txt")
val text =rdd.map(lines=>lines.split(",")).map(arrays=>arrays(0),arrays(1),arrays(2),arrays(3),arrays(4)).toDF("first_name","sports","medal_type","year","country")

text.filter(text("medal_type")==="silver" && ("country")==="USA" groupBy("year").count().show        

2) What is the difference between === and == When I use filter and select with === with just one condition in it (no && or ||), it shows me the string result and boolean result respectively but when I use select and filter with ==, errors throws

Upvotes: 0

Views: 867

Answers (1)

yakout
yakout

Reputation: 862

using this:

text.filter(text("medal_type")==="silver" && text("country")==="USA").groupBy("year").count().show

+----+-----+
|year|count|
+----+-----+
|2017|    1|
+----+-----+

Will just answer your first question. (note that there is a typo in silver in first line)

About the second question:

== and === is just a functions in Scala

In spark === is using equalTo method which is the equality test https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Column.html#equalTo-java.lang.Object-

   // Scala:
   df.filter( df("colA") === df("colB") )

   // Java
   import static org.apache.spark.sql.functions.*;
   df.filter( col("colA").equalTo(col("colB")) );

and == is using euqals method which just test if two references are the same object. https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Column.html#equals-java.lang.Object-

Notice the return types of each function == (equals) returns boolean while === (equalTo) returns a Column of the results.

Upvotes: 1

Related Questions