Reputation: 681
I have 2 data frames
Name1 Value
A1 1
A2 2
A3 0
A4 -3
Name2 Value
B1 1
B2 -1
B3 -10
B4 4
Now I want a third data frame, that picks the rows of the two data frames above, subject to a condition, in particular: "Pick the row with the smaller value"
Desired Output
A1 1 # <- What happens when tied?
B2 -1
B3 -10
A4 -3
The next question is what happens if there are ties? Actually, I would be open to take the first one for instance, or randomly picked, whatever is easier.
Upvotes: 2
Views: 3290
Reputation: 8773
One way is to create the third dataframe as a copy of one of the initial ones, and afterwards substitute the values whose counterpart in the other dataframe is smaller:
df3 = df1.copy()
# Change this expression to choose how to treat ties
min_mask = df1.Value>df2.Value # i.e. df1.Value>=df2.Value
df3.loc[min_mask] = df2.rename(columns={'Name2':'Name1'})[min_mask]
Output df3
:
Name1 Value
0 A1 1
1 B2 -1
2 B3 -10
3 A4 -3
There is a rename
in df2
in order to write the whole row to df3
, otherwise, the columns with different name get a NaN
value.
About the ties, you can easily choose what do you want to do by changing the greater than by greater or equal than.
To this date, there have been 4 answers to this question, all of them completely answer the question using different approaches, therefore, depending on their final application, this one might not be the best answer.
To ease the process of finding the adequate answer, I am posting a summary of the elapsed time of each of the 4 answers as a function of the number of rows in df1
and df2
, which usually is one of the key factors when choosing.
Upvotes: 2
Reputation: 18906
We could recreate the dataframe by zipping together df1 and df2 values and using min()
passing a key:
df3 = pd.DataFrame((min(i, key = lambda x: x[1]) for i in zip(df1.values, df2.values)),
columns = ['Desired','Output'])
Returns:
Desired Output
0 A1 1
1 B2 -1
2 B3 -10
3 A4 -3
Upvotes: 3
Reputation: 323226
Just do it with drop_duplicates
pd.concat([df1,df2.rename(columns={'Name2':'Name1'})]).sort_values('Value').reset_index().drop_duplicates('index').sort_values('index').set_index('index')
Out[36]:
Name1 Value
index
0 A1 1
1 B2 -1
2 B3 -10
3 A4 -3
Upvotes: 2
Reputation: 164623
This is one way, assuming you begin with dataframes df1
and df2
.
res = df1.rename(columns={'Name1': 'Name'})\
.append(df2.rename(columns={'Name2': 'Name'}))\
.sort_values('Value')
res = res.loc[~res.index.duplicated()]\
.sort_index()
Result
Name Value
0 A1 1
1 B2 -1
2 B3 -10
3 A4 -3
Explanation
There are 2 steps:
df2
to df1
after aligning column names and sort by Value
.Upvotes: 2