how search on a specific rows of a dataframe to find a match in a second dataframe?

Question

I have a large data set and I wanna do the following task in an efficient way. suppose we have 2 data frames. for each element in df2 I wanna search in the first data set df1 only in row where the first 2 letters are in common then the word with the most common token is choosen. Let's see in an example:

df1: 

common work   co
summer hot    su
apple         ap
colorful fall  co
support it     su
could comp     co

df2:

condition work it  co
common mistakes    co
could comp work    co
summer             su

Take the first row of df2 as an example (condition work it). I wanna find a row in df1 where they have the same first_two and have the most common token.

The first_two of condition work it is co. so I wanna search in df1 where first_two is co. So the search is done among: common work, colorful fall, could comp since condition work it has 1 common token with common work it is selected.

output:

df2:
name               first_two      match
condition work it  co              `common work`     
common mistakes    co              `common work`
could comp work    co              `could comp` 
summer             su              `summer hot'
appears            ap                  Nane

The last row is Nane since there is no common word between appears and apple

I did following:

df3=(df1.groupby(['first_two'])
      .agg({'name': lambda x: ",".join(x)})
      .reset_index())
merge_=df3.merge(df2, on='first_two',how='inner')

But now I have to search in name_x for each name_y. how to find an element of name_x whose has the most common token with name_y?

how search on a specific rows of a dataframe to find a match in a second dataframe?

Answers (1)

Related Questions