test acc
test acc

Reputation: 591

Possible to filter Spark dataframe by ISNUMERIC function?

I have a DataFrame for a table in SQL. I want to filter this DataFrame if a value of a certain column is numeric or not.

val df = sqlContext.sql("select * from myTable");
val filter = df.filter("ISNUMERIC('col_a')");

I want filter to be a dataframe of df where the values in col_a are numeric.

My current solution doesn't work. How can I achieve this?

Upvotes: 5

Views: 17957

Answers (4)

user13706467
user13706467

Reputation: 1

spark.sql("select phone_number, (CASE WHEN LENGTH(REGEXP_REPLACE(phone_number),'[^0-9]', '')) = LENGTH(TRIM(phone_number)) THEN true ELSE false END) as phone_number_isNumeric from table").show()

This is really an old post, but still if anybody looking for alternate solution.

REGEXP_REPLACE(phone_number),'[^0-9]', '' 

removes all characters except numeric

Upvotes: -1

Alex Savitsky
Alex Savitsky

Reputation: 2371

You can cast the field in question to DECIMAL and inspect the result:

filter("CAST(col_a AS DECIMAL) IS NOT NULL")

Optionally, you can pass length and/or precision to narrow down the valid numbers to a specific maximum length:

filter("CAST(col_a AS DECIMAL(18,8)) IS NOT NULL")

Upvotes: 1

bsplosion
bsplosion

Reputation: 2866

Shankar Koirala's answer covers integers effectively. The regex below would cover use cases requiring doubles, with optional negative signing and handling of nulls (note that this is a Java variation):

df.filter( df.col("col_a").isNotNull() )
  .filter( ( FilterFunction<Row> )
      row -> row.getString( row.fieldIndex( "col_a" ) ).matches( "-?\\d+\\.?\\d*" ) )

Upvotes: 0

koiralo
koiralo

Reputation: 23109

You can filter out as

df.filter(row => row.getAs[String]("col_a").matches("""\d+"""))

Hope this helps!

Upvotes: 3

Related Questions