Reputation: 16138
I have data in a Dataframe, all columns as strings. Now, some of the data in a column is numeric so I could cast to float. Other rows actually contain strings which I do not want to cast.
So I was looking for something like a try_cast, and already tried building something on .when().otherwise()
but didn't succeed so far.
casted = data.select(when(col("Value").cast("float").isNotNull(), col("Value").cast("float")).otherwise(col("Value")))
This does not work, it will never cast in the end.
Is something like this generally possible (in a performant manner without UDFs etc)?
Upvotes: 0
Views: 9642
Reputation: 13926
You can't have a column with two types in spark: either float or string. That's why your column has always string
type (because it can contain both: strings and floats).
What your code does, is: if the number in Value
column doesn't fit into float, it will be casted to float, and then to string (try with >6 decimal places). As far as I know TRY_CAST converts to value or null (at least in SQL Server), so this is exactly what spark's cast
does.
Upvotes: 4