Pyspark equivalent of pandas all fuction

Question

I have a spark dataframe df:

   A    B     C     D
 True  True  True  True
 True  False True  True
 True  None  True  None
 True  NaN   NaN   False
 True  NaN   True  True

Is there a way in pyspark to get a fifth column based on rows A, B, C, D not having the value False in them but returning an int value or 1 for True and 0 for false. Hence:

   A    B     C     D     E
 True  True  True  True   1
 True  False True  True   0
 True  None  True  None   1
 True  NaN   NaN   False  0
 True  NaN   True  True   1

This can be acheived in a pandas dataframe with the function df.all().astype(int).

Any help for a pyspark equivalent would be most appreciated, please.

mck · Accepted Answer

I don't have anything to test, but try the code below:

df2 = df.withColumn(
    'E',
    (
        (F.greatest(*df.columns) == F.least(*df.columns)) & 
        (F.least(*df.columns) == F.lit(True))
    ).cast('int')
)

Pyspark equivalent of pandas all fuction

Answers (1)

Related Questions