Reputation: 11
I have tried searching if someone has asked this question about PySpark but I had no success.
I have a DataFrame of messy names, called df1 (as indicated in the image) and I prepared a DataFrame of clean names, called df2 (see the image). How can I use .join() and .isin() or anything else to obtain the last table that is in the attached image?
Here is the image:
I have tried
cond = [df2[Clean_names].isin(df1[Names])]
df1 = df1.join(df2, cond, "left")
but the result was an error saying that .join() expects something else as arguments. I'm sorry, I don't have the exact error log anymore. The real DataFrames are quite big, so I can't use any iterative operations (i.e. for loops, work on pandas with .loc(), work on pandas at all...)
Also I just created an account on stackoverflow, so I'm sorry I couldn't format my question better.
Upvotes: 1
Views: 263