sanjayr
sanjayr

Reputation: 1939

Pyspark -- How to left merge dataframes

In Pandas I can merge two dataframes like so:

df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
                    'value': [1, 2, 3, 5]})
df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
                    'value': [5, 6, 7, 8]})

df1.merge(df2, how='left', left_on='lkey', right_on='rkey')


  lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  bar        2  bar        6
3  baz        3  baz        7
4  foo        5  foo        5
5  foo        5  foo        8

What would the equivalent of this be in pyspark? A left join?

Upvotes: 0

Views: 141

Answers (1)

Shubham Jain
Shubham Jain

Reputation: 5526

You can apply join in pyspark as

df = df1.join(df2, df1.lkey==df2.rkey, 'left_outer')

Upvotes: 1

Related Questions