Reputation: 11
How can I check the columns of dataframe is null or empty ins spark.
Ex.
type IdentifiedDataFrame = {SourceIdentfier, DataFrame}
def splitRequestIntoDFsWithAndWithoutTransactionId(df: DataFrame) : Seq[IdentifiedDataFrame] = {
seq((DeltaTableStream(RequestWithTransactionId), df.filter(col(RequestLocationCodeColName).isNull
&& col(ServiceNumberColName).isNull
&& col(DateOfServiceColName).isNull
&& col(TransactionIdColName).isNotNull)).
(DeltaTableStream(RequestWithoutTransactionId), df.filter(col(RequestLocationCodeColName).isNotNull
&& col(ServiceNumberColName).isNotNull
&& col(DateOfServiceColName).isNotNull))
)
}
Note : this code only check the null value in column and I want to check null or empty string both Please help
Upvotes: 0
Views: 2458
Reputation: 23119
You can use isNull
function and check for empty String with filter
as below
val columns = List("column1", "column2")
val filter = columns.map(c => isnull(col(c)) || !(col(c) <=> lit("")))
.reduce(_ and _)
df.filter(filter)
Upvotes: 1