Jatin
Jatin

Reputation: 113

Converting String to Integer Returns null in PySpark

I am trying to convert a string to integer in my PySpark code.

input = 1670900472389, where 1670900472389 is a string

I am doing this but it's returning null.

df = df.withColumn("lastupdatedtime_new",col("lastupdatedtime").cast(IntegerType()))

I have read the posts on Stack Overflow. They have quotes or commas in their input string causing this. However that's not the case with my input string. Any ideas what's happening?

Upvotes: 1

Views: 4412

Answers (1)

Azhar Khan
Azhar Khan

Reputation: 4098

The max value that a Java integer can hold is 2147483647 i.e. 32-bits or 231-1.

Use LongType instead:

import pyspark.sql.functions as F
from pyspark.sql.types import LongType

df = spark.createDataFrame(data=[["1670900472389"]], schema=["lastupdatedtime"])

df = df.withColumn("lastupdatedtime_new", F.col("lastupdatedtime").cast(LongType()))

Output:

+---------------+-------------------+
|lastupdatedtime|lastupdatedtime_new|
+---------------+-------------------+
|1670900472389  |1670900472389      |
+---------------+-------------------+

Schema:

root
 |-- lastupdatedtime: string (nullable = true)
 |-- lastupdatedtime_new: long (nullable = true)

Upvotes: 4

Related Questions