Reputation: 21
I need to join those two dfs and sum rate
-I've tried:
df_joined = df_text.join(df_words, f.expr("text rlike word"), 'left')
-or
df_joined = df_text.join(df_words, on=df_text.text.contains(df_words.word),how='left')
But it finds part of word too (e.g - df_words contains "slow" and "slowly", and if "slowly" is in text, two rates joins, but I need only one - "slowly").
Any suggestions?Thanks
Upvotes: 1
Views: 97
Reputation: 21
This seems to work fine)
split_col = f.split(df_text['text'], ' ')
df_text = df_text.withColumn('txt_split', split_col)
df_join = df_text.withColumn('word', f.explode("txt_split").alias("word"))\
.join(df_words, "word", 'left')
Upvotes: 1