RKW
RKW

Reputation: 1

How to return null incase integer column is having incorrect value in pyspark

I am reading data from csv in dataframe. For one column I am expecting integer value but in source there are some incorrect values.

Col1
------
1234
2346
ab45
12.30

By using cast('int') getting below

Col_new
------
1234
2346
null
12

I am looking for below output

Col_new
-------
1234
2346
null
null

Upvotes: 0

Views: 944

Answers (1)

mck
mck

Reputation: 42352

You can check whether the column contains .:

import pyspark.sql.functions as F

df2 = df.withColumn(
    'col_new', 
    F.when(
        ~F.col('col1').contains('.'), 
        F.col('col1').cast('int')
    )
)

df2.show()
+-----+-------+
| col1|col_new|
+-----+-------+
| 1234|   1234|
| 2346|   2346|
| ab45|   null|
|12.30|   null|
+-----+-------+

Upvotes: 1

Related Questions