Reputation: 1
I am reading data from csv in dataframe. For one column I am expecting integer value but in source there are some incorrect values.
Col1
------
1234
2346
ab45
12.30
By using cast('int') getting below
Col_new
------
1234
2346
null
12
I am looking for below output
Col_new
-------
1234
2346
null
null
Upvotes: 0
Views: 944
Reputation: 42352
You can check whether the column contains .
:
import pyspark.sql.functions as F
df2 = df.withColumn(
'col_new',
F.when(
~F.col('col1').contains('.'),
F.col('col1').cast('int')
)
)
df2.show()
+-----+-------+
| col1|col_new|
+-----+-------+
| 1234| 1234|
| 2346| 2346|
| ab45| null|
|12.30| null|
+-----+-------+
Upvotes: 1