silent
silent

Reputation: 16138

PySpark SQL TRY_CAST?

I have data in a Dataframe, all columns as strings. Now, some of the data in a column is numeric so I could cast to float. Other rows actually contain strings which I do not want to cast.

So I was looking for something like a try_cast, and already tried building something on .when().otherwise() but didn't succeed so far.

casted = data.select(when(col("Value").cast("float").isNotNull(), col("Value").cast("float")).otherwise(col("Value")))

This does not work, it will never cast in the end.

Is something like this generally possible (in a performant manner without UDFs etc)?

Upvotes: 0

Views: 9642

Answers (1)

Mariusz
Mariusz

Reputation: 13926

You can't have a column with two types in spark: either float or string. That's why your column has always string type (because it can contain both: strings and floats).

What your code does, is: if the number in Value column doesn't fit into float, it will be casted to float, and then to string (try with >6 decimal places). As far as I know TRY_CAST converts to value or null (at least in SQL Server), so this is exactly what spark's cast does.

Upvotes: 4

Related Questions