Learnis
Learnis

Reputation: 576

Convert spark Dataframe Column nullability to False without converting to rdd

I have a data frame with columns nullability as True. Wanted to convert to False in Pyspark.

I can do it in the below way. But I don't want to convert to rdd because I'm reading as structured streaming and converting to rdd is not recommended.

def set_df_columns_nullable(self, spark, df, column_list, nullable=True):
        for struct_field in df.schema:
            if struct_field.name in column_list:
                struct_field.nullable = nullable
        df_mod = spark.createDataFrame(df.rdd, df.schema)
        return df_mod

Thanks in Advance

Upvotes: 1

Views: 1105

Answers (1)

Wassim Maaoui
Wassim Maaoui

Reputation: 203

You can actually update column nullability without casting to RDD

dataFrame
  .withColumn(columnName, new Column(AssertNotNull(col(columnName).expr)))

source

Note that the above would fail at execution if you have null values

Upvotes: 1

Related Questions