Reputation: 523

How to select elements of a column of a dataframe with respect to a column of the another dataframe?

How I can use two dataframes, and select elements of df2, if a column in df1 is included in a column in df2and NA otherwise.

df2:

name
summer
winter
water
play

df1:

col1               
play ground
winter cold
something
work

output:

col1               name
play ground        play
winter cold        winter
something          NA
work               NA

Upvotes: 0

Answers (3)

Linus

Reputation: 669

df1 = spark.createDataFrame([("play ground",),("winter cold",),("something",),("work",)], ['col1',])
df2 = spark.createDataFrame([("summer",),("winter",),("play bc",),("play",)], ['name',])

df1 = df1.withColumn('common_word', explode(split(col('col1'), '\s')))
# Also split & explode Column 'name' of df2.
df2 = df2.withColumn('common_word', explode(split(col('name'), '\s')))

(
    df1
        .join(df2, ['common_word'], "left")
        .sort('col1')
    .fillna("NA")
    .show()
)
+-----------+-----------+-------+
|common_word|       col1|   name|
+-----------+-----------+-------+
|     ground|play ground|     NA|
|       play|play ground|play bc|
|       play|play ground|   play|
|  something|  something|     NA|
|       cold|winter cold|     NA|
|     winter|winter cold| winter|
|       work|       work|     NA|
+-----------+-----------+-------+

Upvotes: 0

过过招

Reputation: 4224

It is recommended to use the contains condition directly to join.

df = df1.join(df2, on=[df1.col1.contains(df2.name)], how='left')
df.show(truncate=False)

Upvotes: 0

wwnde

Reputation: 26676

#Create match column
df1 = df1.alias('df1').withColumn('col_new',explode(split('col1','\s')))




new = (df1.join(df2, how='left',on=df1.col_new==df2.name)#merge on common columns
       .drop('col_new')#drop the match column introduced
       .orderBy([df2.name.desc(),'name'])#Order the df
       .drop_duplicates(['col1'])#eliminate duplicates
      ).show()

+-----------+------+
|       col1|  name|
+-----------+------+
|play ground|  play|
|  something|  null|
|winter cold|winter|
|       work|  null|
+-----------+------+

Upvotes: 1

How to select elements of a column of a dataframe with respect to a column of the another dataframe?

Answers (3)

Related Questions