Reputation: 41
I have a spark dataframe df:
A B C D
True True True True
True False True True
True None True None
True NaN NaN False
True NaN True True
Is there a way in pyspark to get a fifth column based on rows A, B, C, D not having the value False in them but returning an int value or 1 for True and 0 for false. Hence:
A B C D E
True True True True 1
True False True True 0
True None True None 1
True NaN NaN False 0
True NaN True True 1
This can be acheived in a pandas dataframe with the function df.all().astype(int)
.
Any help for a pyspark equivalent would be most appreciated, please.
Upvotes: 0
Views: 1015
Reputation: 42332
I don't have anything to test, but try the code below:
df2 = df.withColumn(
'E',
(
(F.greatest(*df.columns) == F.least(*df.columns)) &
(F.least(*df.columns) == F.lit(True))
).cast('int')
)
Upvotes: 1