LamaMo
LamaMo

Reputation: 626

Mix comparison of a multiple columns in two dataframe and add a new one using python | pandas

I have file1 that have 6 columns as follow:

proNum   proName   color      path    comNum    comName
1        apple     red        ewtt    43        namex
1        apple     red        lfor    27        namey
1        apple     red        bcms    276       namez
4        orange    yellow     owif    7         namea
4        orange    yellow     zxaq    19        nameb
8        burry     bink       lpoq    99        namec

And file2 with 2 columns as follow:

comName    color
nameb      red
namex      bink
nameb      black
namez      blue
namec      yellow
namey      red
namez      red
nameb      bink

What I'm looking for is adding a column in file1 resulted by/from: check/go through each value of comName in file1, compare it one-by-one with comName in file2 if it found the same value then takes the corresponding value of color in file2 and compare it with color in file1 if it's the same put the value of color in that new column and exit then check the next value of comName.

If the color values not equal, keep going in file2 because comName appear one or more than one for some of them, if checking finish in file2 and nothing equal in term of color then put No/Nan in that new column.

Here is the output that should be:

proNum   proName   color      path    comNum    comName   New_color
1        apple     red        ewtt    43        namex     No
1        apple     red        lfor    27        namey     red
1        apple     red        bcms    276       namez     red
4        orange    yellow     owif    7         namea     No
4        orange    yellow     zxaq    19        nameb     No
8        burry     bink       lpoq    99        namec     No

It seems to be easy but those mixes comparisons really confused me of how to do it, so any help/ideas will be appreciated.

Upvotes: 0

Views: 39

Answers (3)

Filip Oleksinski
Filip Oleksinski

Reputation: 136

Applying a left join on the tables on the comName and color columns should leed to the desired output

file2['color_y']=['red','bink','black','blue','yellow','red','red','bink'] file1.merge(file2, how='left', on=['comName','color'].fillna('No')

Upvotes: 1

ALollz
ALollz

Reputation: 59549

You can use a merge here, that way you only assign a New_color when it matches exactly on comName and color.

df1.merge(df2.assign(New_color=df2.color), how='left').fillna('No')

Output

   proNum proName   color  path  comNum comName New_color
0       1   apple     red  ewtt      43   namex        No
1       1   apple     red  lfor      27   namey       red
2       1   apple     red  bcms     276   namez       red
3       4  orange  yellow  owif       7   namea        No
4       4  orange  yellow  zxaq      19   nameb        No
5       8   burry    bink  lpoq      99   namec        No

Upvotes: 2

Try this code:

df2 = df2.drop_duplicates(['comName'], keep='last')
result = df.merge(df2, on='comName', how='left', suffixes=('', '_y'))
result['New_color'] = result.apply(lambda x: x.color if x.color == x.color_y else 'No', 1)
result = result.drop(['color_y'], 1)

df is file1 and df2 is file2.

Output:

  proNum proName   color  path comNum comName New_color
0      1   apple     red  ewtt     43   namex        No
1      1   apple     red  lfor     27   namey       red
2      1   apple     red  bcms    276   namez       red
3      4  orange  yellow  owif      7   namea        No
4      4  orange  yellow  zxaq     19   nameb        No
5      8   burry    bink  lpoq     99   namec        No

Upvotes: 1

Related Questions