Reputation: 37
I have 2 tables, first is the testappointment table and 2nd is the actualTests table. i want to join the 2 df in such a way that the resulting table should have column "NoShows". this column depicts that the person booked a appointment for that date, but actually did not show up for the test at the test date. Example-
testAppointment Table
personId | testDate |
---|---|
x | 2021-02-12 |
y | 2021-03-18 |
x | 2020-11-01 |
z | 2020-09-10 |
y | 2021-01-08 |
z | 2020-12-24 |
Test Table
personId | ActualtestDate |
---|---|
x | 2021-02-12 |
y | 2021-03-18 |
x | 2020-11-01 |
z | 2020-09-10 |
Since, the person y(2021-01-08) and z(2020-12-24) did not showup for their tests on 2 dates hence, there is no entry for them in the test table. hence, i want to have no-show for them in their respective rows after the two tables are joined.
Upvotes: 0
Views: 9703
Reputation: 42332
You can do a left join and create a new column based on whether ActualtestDate
is null:
result = testappointment.join(
test,
(testappointment.testDate == test.ActualtestDate) & (testappointment.personId == test.personId),
'left'
).select(
testappointment.personId,
testappointment.testDate,
test.ActualtestDate.isNull().alias('noshow')
)
result.show()
+--------+----------+------+
|personId| testDate|noshow|
+--------+----------+------+
| x|2021-02-12| false|
| y|2021-03-18| false|
| x|2020-11-01| false|
| z|2020-09-10| false|
| y|2021-01-08| true|
| z|2020-12-24| true|
+--------+----------+------+
Upvotes: 1