Reputation: 1939
In Pandas I can merge two dataframes like so:
df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
'value': [1, 2, 3, 5]})
df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
'value': [5, 6, 7, 8]})
df1.merge(df2, how='left', left_on='lkey', right_on='rkey')
lkey value_x rkey value_y
0 foo 1 foo 5
1 foo 1 foo 8
2 bar 2 bar 6
3 baz 3 baz 7
4 foo 5 foo 5
5 foo 5 foo 8
What would the equivalent of this be in pyspark? A left join?
Upvotes: 0
Views: 141
Reputation: 5526
You can apply join in pyspark as
df = df1.join(df2, df1.lkey==df2.rkey, 'left_outer')
Upvotes: 1