Fluxy
Fluxy

Reputation: 2978

How to cast String float to Float in PySpark?

I have the following PySpark dataframe:

df = spark.createDataFrame(
    [
        ('31,2', 'foo'),
        ('33,1', 'bar'),
    ],
    ['cost', 'label']
)

I need to cast the ´cost´ column to float. I do it as follows:

df = df.withColumn('cost', df.cost.cast('float'))

However, as I result I get null values instead of numbers in the cost column.

How can I convert cost to float numbers?

Upvotes: 0

Views: 1963

Answers (2)

Ben Y
Ben Y

Reputation: 1023

I think a simple lambda expression should take care of most things.

    df.loc[:, 'cost'] = df.cost.apply(lambda x: float(x.replace(',', '.')))

Upvotes: 1

CharlieBONS
CharlieBONS

Reputation: 216

This should work for you.

df = (df.withColumn('cost', F.regexp_replace(df.cost, ',', '.')
        .withColumn('cost', df.cost.cast('float')))

Upvotes: 2

Related Questions