Reputation: 20590
How to select * in pyspark join
impression_rdd.join(
click_rdd,
impression_rdd.session_id == click_rdd.session_id,
"left_outer"
).select(impression_rdd.*) <------- pseudo code; how do you do this?
Basically, the sql equivalent
SELECT impression.* FROM impression LEFT JOIN click on (impression.session_id = click.session_id)
Upvotes: 4
Views: 1577
Reputation: 38452
two other equivalent constructs to zero323's answer:
(impressions.join(clicks, 'session_id', 'left_outer')
.select(*impressions.columns))
and if you only have one column, say 'count', to drop in the right-hand table, this might be more readable.
(impressions.join(clicks, 'session_id', 'left_outer')
.drop('count'))
Upvotes: 1
Reputation: 330203
You can simply add alias and a couple of quotes to your pseudocode:
(impressions.alias("impressions")
.join(clicks, ["id"], "left_outer")
.select("impressions.*"))
Upvotes: 2