Reputation: 173
For simplicity, I've a table in BigQuery with one field of type "Numeric". When I try to write a PySpark dataframe, with one column, to BigQuery it keeps on raising the NullPointerException. I tried converting pyspark column into int, float, string, and even encode it but it keeps on throwing the NullPointerException. Even after spending 5 to 6 hours, I'm unable to figure it out myself or on the internet that what is the issue here and what should be the exact pyspark dataframe column type for mapping it to BigQuery Numeric column type. Any help or direction would be of great help. Thanks in advance.
Upvotes: 0
Views: 1622
Reputation: 11
This is due to the range of spark data frames has. It can accomodate only 10 digit number. In-order to correct this issue please cast the number to Long datatype.
IntegerType: Represents 4-byte signed integer numbers. The range of numbers is from
-2147483648 to 2147483647.
https://spark.apache.org/docs/latest/sql-ref-datatypes.html
Hope this helps.
Upvotes: 0
Reputation: 173
For anyone who faces the same issue, you just have to cast the column to decimal type.
from pyspark.sql.types import DecimalType
subscriber_df_deu.withColumn('column', col('column').cast(DecimalType()))
Upvotes: 0