Reputation: 576
I have a data frame with columns nullability as True. Wanted to convert to False in Pyspark.
I can do it in the below way. But I don't want to convert to rdd because I'm reading as structured streaming and converting to rdd is not recommended.
def set_df_columns_nullable(self, spark, df, column_list, nullable=True):
for struct_field in df.schema:
if struct_field.name in column_list:
struct_field.nullable = nullable
df_mod = spark.createDataFrame(df.rdd, df.schema)
return df_mod
Thanks in Advance
Upvotes: 1
Views: 1105
Reputation: 203
You can actually update column nullability without casting to RDD
dataFrame
.withColumn(columnName, new Column(AssertNotNull(col(columnName).expr)))
Note that the above would fail at execution if you have null values
Upvotes: 1