Reputation: 6173
I have two data frames with the same column types.
First Dataframe (df1)
data = [['BTC', 2], ['ETH', 1], ['ADA', 100]]
df1 = pd.DataFrame(data, columns=['Coin', 'Quantity'])
Coin Quantity
BTC 2
ETH 1
ADA 100
... ...
Second Dataframe (df2)
data = [['BTC', 50000], ['FTM', 50], ['ETH', 1500], ['LRC', 5], ['ADA', 20]]
df2 = pd.DataFrame(data, columns=['code_name', 'selling rate'])
code_name selling rate
BTC 50000
FTM 50
ETH 1500
LRC 5
ADA 20
... ...
Expected output (FTM and LRC should be removed)
Coin Quantity selling rate
BTC 2 50000
ETH 1 1500
ADA 100 20
... ... ...
What I have tried
df1.merge(df2, how='outer', left_on=['Coin'], right_on=['code_name'])
df = np.where(df1['Coin'] == df2['code_name'])
Both codes did not give me the expected output. I searched on StackOverflow and couldn't find any helpful answer. Can anyone give a solution or make this question as duplicate if a related question exist?
Upvotes: 4
Views: 1043
Reputation: 1104
Since merge()
is slow for large dataset. I prefer not to use it as long as I have a faster solution. Therefore, I suggest the following:
full_df = df1.copy()
full_df['selling_rate'] = list(
df2['selling_rate'][df2['code_name'].isin(df1['Coin'].unique())])
Note: This turns to the expected solution if df1
and df2
are in the same order with respect to Coin
and code_name
. If they are not, you should use sort_values()
before the above code.
Upvotes: 1
Reputation: 449
What you need is an inner join, not an outer join. Inner joins only retain records that are common in the two tables you're joining together.
import pandas as pd
# Make the first data frame
df1 = pd.DataFrame({
'Coin': ['BTC', 'ETH', 'ADA'],
'Quantity': [2, 1, 100]
})
# Make the second data frame
df2 = pd.DataFrame({
'code_name': ['BTC', 'FTM', 'ETH', 'LRC', 'ADA'],
'selling_rate': [50000, 50, 1500, 5, 20]
})
# Merge the data frames via inner join. This only keeps entries that appear in
# both data frames
full_df = df1.merge(df2, how = 'inner', left_on = 'Coin', right_on = 'code_name')
# Drop the duplicate column
full_df = full_df.drop('code_name', axis = 1)
Upvotes: 4