Reputation: 490
//dataset
michael,swimming,silve,2016,USA
usha,running,silver,2014,IND
lisa,javellin,gold,2014,USA
michael,swimming,silver,2017,USA
Questions -- 1) How many silver medals have been won by the USA in each sport -- and the code throws the error value === is not the member of string
val rdd = sc.textFile("/home/training/mydata/sports.txt")
val text =rdd.map(lines=>lines.split(",")).map(arrays=>arrays(0),arrays(1),arrays(2),arrays(3),arrays(4)).toDF("first_name","sports","medal_type","year","country")
text.filter(text("medal_type")==="silver" && ("country")==="USA" groupBy("year").count().show
2) What is the difference between === and == When I use filter and select with === with just one condition in it (no && or ||), it shows me the string result and boolean result respectively but when I use select and filter with ==, errors throws
Upvotes: 0
Views: 867
Reputation: 862
using this:
text.filter(text("medal_type")==="silver" && text("country")==="USA").groupBy("year").count().show
+----+-----+
|year|count|
+----+-----+
|2017| 1|
+----+-----+
Will just answer your first question. (note that there is a typo in silver in first line)
About the second question:
== and === is just a functions in Scala
In spark === is using equalTo method which is the equality test https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Column.html#equalTo-java.lang.Object-
// Scala:
df.filter( df("colA") === df("colB") )
// Java
import static org.apache.spark.sql.functions.*;
df.filter( col("colA").equalTo(col("colB")) );
and == is using euqals method which just test if two references are the same object. https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Column.html#equals-java.lang.Object-
Notice the return types of each function == (equals) returns boolean while === (equalTo) returns a Column of the results.
Upvotes: 1