user2205916
user2205916

Reputation: 3456

Scala Spark: Filter rows based on values in a column of Floats

Why isn't the following code working? I am trying to filter out rows such that they contain values in: [10.0, 100.0]. Both of the following solutions produce the same result. Do I need to Cast()` or something?

Solution 1:

dff1.select("hrs").filter(col("hrs").geq(lit("10")) && 
                          col("hrs").leq(lit("100")) ).show(10, truncate = false)

Solution 2:

dff1.select("hrs").filter(col("hrs") >= lit("10") && 
                          col("hrs") <= lit("100") ).show(10, truncate = false)

Result:

+------------------+
|hrs               |
+------------------+
|239.78444444444443|
|24.459444444444443|
|238.05944444444444|
|45.05138888888889 |
|213.6225          |
|20.04388888888889 |
|201.45333333333335|
|4393.384166666667 |
|260.2611111111111 |
|47.83083333333333 |
+------------------+

Upvotes: 0

Views: 1413

Answers (2)

stack0114106
stack0114106

Reputation: 8711

Better to use expressions for the filter. The expression would be the same as you use it in SQL "where" clause (leave the integers/floats as such and wrap the string constants in single quotes).

So your transformation becomes.

dff1.select("hrs").filter(" hrs >= 10 and hrs <= 100 ")

Upvotes: 1

mck
mck

Reputation: 42352

lit is not necessary for integers or floats:

dff1.select("hrs").filter(col("hrs") >= 10 && col("hrs") <= 100)

should also work.

Upvotes: 1

Related Questions