How to calculate Boolean values of two column from two Dataframes?

Question

I have two Dataframes A and B, like:

A Dataframes is:
+----+-----+
|   k|    v|
+----+-----+
|key1|False|
|key2|False|
|key3|False|
|key4|False|
|key5|False|
|key6|False|
+----+-----+
B Dataframes is:
+----+----+
|   k|   v|
+----+----+
|key2|True|
|key3|True|
+----+----+

I want to join A and B table with k column, and calculate v column, the result like:

+----+-----+
|   k|    v|
+----+-----+
|key1|False|
|key2|True |
|key3|True |
|key4|False|
|key5|False|
|key6|False|
+----+-----+

I supposed the code like:

A.join(B,'k','left_out')

But, I don't know how to calculate the column v.

I borrowed the idea of @Vitaliy Kotlyarenko to modify my code:

from pyspark.sql import functions as F

A.join(B,'k','left_out').withColumn('value', F.col('v') & F.col('v')).drop('v')

Vitalii Kotliarenko · Accepted Answer

You can use withColumn method

A.join(B,'k','left_out')
 .withColumn('value', A.col('v') and A.col('v'))
 .select(A.col('id'), col('value'))

I'm not sure about syntax correctness of example above - it written in Scala and translated to Python, but it should give the idea.

How to calculate Boolean values of two column from two Dataframes?

Answers (1)

Related Questions