Reputation: 591
I have a DataFrame
for a table in SQL. I want to filter this DataFrame
if a value of a certain column is numeric or not.
val df = sqlContext.sql("select * from myTable");
val filter = df.filter("ISNUMERIC('col_a')");
I want filter to be a dataframe of df
where the values in col_a
are numeric.
My current solution doesn't work. How can I achieve this?
Upvotes: 5
Views: 17957
Reputation: 1
spark.sql("select phone_number, (CASE WHEN LENGTH(REGEXP_REPLACE(phone_number),'[^0-9]', '')) = LENGTH(TRIM(phone_number)) THEN true ELSE false END) as phone_number_isNumeric from table").show()
This is really an old post, but still if anybody looking for alternate solution.
REGEXP_REPLACE(phone_number),'[^0-9]', ''
removes all characters except numeric
Upvotes: -1
Reputation: 2371
You can cast the field in question to DECIMAL
and inspect the result:
filter("CAST(col_a AS DECIMAL) IS NOT NULL")
Optionally, you can pass length and/or precision to narrow down the valid numbers to a specific maximum length:
filter("CAST(col_a AS DECIMAL(18,8)) IS NOT NULL")
Upvotes: 1
Reputation: 2866
Shankar Koirala's answer covers integers effectively. The regex below would cover use cases requiring doubles, with optional negative signing and handling of nulls (note that this is a Java variation):
df.filter( df.col("col_a").isNotNull() )
.filter( ( FilterFunction<Row> )
row -> row.getString( row.fieldIndex( "col_a" ) ).matches( "-?\\d+\\.?\\d*" ) )
Upvotes: 0
Reputation: 23109
You can filter
out as
df.filter(row => row.getAs[String]("col_a").matches("""\d+"""))
Hope this helps!
Upvotes: 3