eternity1
eternity1

Reputation: 681

Create new dataframe from two other dataframes conditional on each row value

I have 2 data frames

Name1    Value
A1         1
A2         2
A3         0
A4        -3

Name2    Value
B1         1
B2        -1
B3       -10
B4         4

Now I want a third data frame, that picks the rows of the two data frames above, subject to a condition, in particular: "Pick the row with the smaller value"

Desired Output
A1         1         # <- What happens when tied?
B2        -1
B3       -10
A4        -3

The next question is what happens if there are ties? Actually, I would be open to take the first one for instance, or randomly picked, whatever is easier.

Upvotes: 2

Views: 3290

Answers (4)

OriolAbril
OriolAbril

Reputation: 8773

One way is to create the third dataframe as a copy of one of the initial ones, and afterwards substitute the values whose counterpart in the other dataframe is smaller:

df3 = df1.copy()
# Change this expression to choose how to treat ties
min_mask = df1.Value>df2.Value  # i.e. df1.Value>=df2.Value
df3.loc[min_mask] = df2.rename(columns={'Name2':'Name1'})[min_mask]

Output df3:

 Name1  Value
0   A1     1
1   B2    -1
2   B3   -10
3   A4    -3

There is a rename in df2 in order to write the whole row to df3, otherwise, the columns with different name get a NaN value.

About the ties, you can easily choose what do you want to do by changing the greater than by greater or equal than.

Edit:

To this date, there have been 4 answers to this question, all of them completely answer the question using different approaches, therefore, depending on their final application, this one might not be the best answer.

To ease the process of finding the adequate answer, I am posting a summary of the elapsed time of each of the 4 answers as a function of the number of rows in df1 and df2, which usually is one of the key factors when choosing.

Elapsed time comparison between methods

Upvotes: 2

Anton vBR
Anton vBR

Reputation: 18906

We could recreate the dataframe by zipping together df1 and df2 values and using min() passing a key:

df3 = pd.DataFrame((min(i, key = lambda x: x[1]) for i in zip(df1.values, df2.values)),
                   columns = ['Desired','Output'])

Returns:

  Desired  Output
0      A1       1
1      B2      -1
2      B3     -10
3      A4      -3

Upvotes: 3

BENY
BENY

Reputation: 323226

Just do it with drop_duplicates

pd.concat([df1,df2.rename(columns={'Name2':'Name1'})]).sort_values('Value').reset_index().drop_duplicates('index').sort_values('index').set_index('index')
Out[36]: 
      Name1  Value
index             
0        A1      1
1        B2     -1
2        B3    -10
3        A4     -3

Upvotes: 2

jpp
jpp

Reputation: 164623

This is one way, assuming you begin with dataframes df1 and df2.

res = df1.rename(columns={'Name1': 'Name'})\
         .append(df2.rename(columns={'Name2': 'Name'}))\
         .sort_values('Value')

res = res.loc[~res.index.duplicated()]\
         .sort_index()

Result

  Name  Value
0   A1      1
1   B2     -1
2   B3    -10
3   A4     -3

Explanation

There are 2 steps:

  1. Append df2 to df1 after aligning column names and sort by Value.
  2. Drop duplicates by index, then sort by index.

Upvotes: 2

Related Questions