Bella
Bella

Reputation: 1017

How to compare 2 dataframes columns and add a value to a new dataframe based on the result

I have 2 dataframes with the same length, and I'd like to compare specific columns between them. If the value of the first column in one of the dataframe is bigger - i'd like it to take the value in the second column and assign it to a new dataframe. See example. The first dataframe:

       0   class
0    1.9       0
1    9.8       0
2    4.5       0
3    8.1       0
4    1.9       0

The second dataframe:

       0   class
0    1.4       1
1    7.8       1
2    8.5       1
3    9.1       1
4    3.9       1

The new dataframe should look like:

  class
0     0
1     0
2     1
3     1
4     1

Upvotes: 2

Views: 85

Answers (3)

jezrael
jezrael

Reputation: 862571

Use numpy.where with DataFrame constructor:

df = pd.DataFrame({'class': np.where(df1[0] > df2[0], df1['class'], df2['class'])})

Or DataFrame.where:

df = df1[['class']].where(df1[0] > df2[0], df2[['class']])

print (df)
   class
0      0
1      0
2      1
3      1
4      1

EDIT:

If there is another condition use numpy.select and if necessary numpy.isclose

print (df2)
     0  class
0  1.4      1
1  7.8      1
2  8.5      1
3  9.1      1
4  1.9      1


masks = [df1[0] == df2[0], df1[0] > df2[0]]
#if need compare floats in some accuracy
#masks = [np.isclose(df1[0], df2[0]), df1[0] > df2[0]]
vals = ['not_determined', df1['class']]
df = pd.DataFrame({'class': np.select(masks, vals, df2['class'])})
print (df)
            class
0               0
1               0
2               1
3               1
4  not_determined

Or:

masks = [df1[0] == df2[0], df1[0] > df2[0]]
vals = ['not_determined', 1]
df = pd.DataFrame({'class': np.select(masks, vals, 1)})
print (df)
            class
0               0
1               0
2               1
3               1
4  not_determined

Solution for out of box:

df = np.sign(df1[0].sub(df2[0])).map({1:0, -1:1, 0:'not_determined'}).to_frame('class')
print (df)
            class
0               0
1               0
2               1
3               1
4  not_determined

Upvotes: 3

Mark Wang
Mark Wang

Reputation: 2757

Since class is 0 and 1, you could try,

df1[0].lt(df2[0]).astype(int)

For generic solutions, check jezrael's answer.

Upvotes: 2

Georgina Skibinski
Georgina Skibinski

Reputation: 13387

Try this one:

>>> import numpy as np
>>> import pandas as pd
>>> df_1
     0  class
0  1.9      0
1  9.8      0
2  4.5      0
3  8.1      0
4  1.9      0
>>> df_2
     0  class
0  1.4      1
1  7.8      1
2  8.5      1
3  9.1      1
4  3.9      1
>>> df_3=pd.DataFrame()
>>> df_3["class"]=np.where(df_1["0"]>df_2["0"], df_1["class"], df_2["class"])
>>> df_3
   class
0      0
1      0
2      1
3      1
4      1

Upvotes: 1

Related Questions