Reputation: 2978
I have the following PySpark dataframe:
df = spark.createDataFrame(
[
('31,2', 'foo'),
('33,1', 'bar'),
],
['cost', 'label']
)
I need to cast the ´cost´ column to float. I do it as follows:
df = df.withColumn('cost', df.cost.cast('float'))
However, as I result I get null
values instead of numbers in the cost
column.
How can I convert cost
to float numbers?
Upvotes: 0
Views: 1963
Reputation: 1023
I think a simple lambda expression should take care of most things.
df.loc[:, 'cost'] = df.cost.apply(lambda x: float(x.replace(',', '.')))
Upvotes: 1
Reputation: 216
This should work for you.
df = (df.withColumn('cost', F.regexp_replace(df.cost, ',', '.')
.withColumn('cost', df.cost.cast('float')))
Upvotes: 2