Kalana
Kalana

Reputation: 6173

Compare and join two columns in two dataframes

I have two data frames with the same column types.

First Dataframe (df1)

data = [['BTC', 2], ['ETH', 1], ['ADA', 100]]
df1 = pd.DataFrame(data, columns=['Coin', 'Quantity'])
Coin     Quantity
BTC          2
ETH          1
ADA        100
...        ...

Second Dataframe (df2)

data = [['BTC', 50000], ['FTM', 50], ['ETH', 1500], ['LRC', 5], ['ADA', 20]]
df2 = pd.DataFrame(data, columns=['code_name', 'selling rate'])
code_name     selling rate
BTC               50000
FTM                  50
ETH                1500
LRC                   5
ADA                  20
...                 ...

Expected output (FTM and LRC should be removed)

Coin     Quantity     selling rate
BTC          2           50000
ETH          1            1500
ADA        100              20
...        ...             ...

What I have tried

df1.merge(df2, how='outer', left_on=['Coin'], right_on=['code_name'])
df = np.where(df1['Coin'] == df2['code_name'])

Both codes did not give me the expected output. I searched on StackOverflow and couldn't find any helpful answer. Can anyone give a solution or make this question as duplicate if a related question exist?

Upvotes: 4

Views: 1043

Answers (2)

Maryam Bahrami
Maryam Bahrami

Reputation: 1104

Since merge() is slow for large dataset. I prefer not to use it as long as I have a faster solution. Therefore, I suggest the following:

full_df = df1.copy()
full_df['selling_rate'] = list(
    df2['selling_rate'][df2['code_name'].isin(df1['Coin'].unique())])

Note: This turns to the expected solution if df1 and df2 are in the same order with respect to Coin and code_name. If they are not, you should use sort_values() before the above code.

Upvotes: 1

rossdrucker9
rossdrucker9

Reputation: 449

What you need is an inner join, not an outer join. Inner joins only retain records that are common in the two tables you're joining together.

import pandas as pd

# Make the first data frame
df1 = pd.DataFrame({
    'Coin': ['BTC', 'ETH', 'ADA'],
    'Quantity': [2, 1, 100]
})

# Make the second data frame
df2 = pd.DataFrame({
    'code_name': ['BTC', 'FTM', 'ETH', 'LRC', 'ADA'],
    'selling_rate': [50000, 50, 1500, 5, 20]
})

# Merge the data frames via inner join. This only keeps entries that appear in
# both data frames
full_df = df1.merge(df2, how = 'inner', left_on = 'Coin', right_on = 'code_name')

# Drop the duplicate column
full_df = full_df.drop('code_name', axis = 1)

Upvotes: 4

Related Questions