Learner
Learner

Reputation: 672

Pandas: Get matched value between two columns

I have 2 dataframes

data1 = {'Product': ['AAA','BBB','CCC','DDD','EEE','FFF'],
        'Id': ['247610','287950','229XYZ','987340','111500','2345OZ'],
        'Price':[40,50,0,985,34,0]}

data2 = {'Product': ['AAA','BBB','CCC','DDD','EEE'],
        'Id': [508760,287950,678897,987340,111500],}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2) 

df1 as shown below:

  Product    Id        Price
0 AAA       247610     40
1 BBB       287950     50
2 CCC       229XYZ      0
3 DDD       987340    985
4 EEE       111500     34
5 FFF       2345OZ      0

with this data types (df1 dtype):

Product    object
Id         object
Price       int64

I want to compare it with another dataframe Df2 to get the best id. df2:

  Product      Id
0     AAA     508760
1     BBB     287950
2     CCC     678897
3     DDD     987340
4     EEE     111500

With data type (df2.dtypes):

Product    object
Id          int64

As you can see i cannot convert the id of my first df1 to int so i converted to float so that i can compare with the id of df2

df1['Id'] = pd.to_numeric(df1['Id'],errors='coerce')
df1["bestID"] = (df1["Id"].isin(df2["Id"]) * df1["Id"]).replace(0, "Nope")
print(df1)

I got the output below:

 Product        Id  Price    bestID
0     AAA  247610.0     40      Nope
1     BBB  287950.0     50  287950.0
2     CCC       NaN      0       NaN
3     DDD  987340.0    985  987340.0
4     EEE  111500.0     34  111500.0
5     FFF       NaN      0       NaN

The problem here is that

But the problem here that i got the df1 below(which is not what what i need) :

  Product      Id  Price    bestID
0     AAA  247610     40      Nope
1     BBB  287950     50  287950.0
2     CCC       0      0         0
3     DDD  987340    985  987340.0
4     EEE  111500     34  111500.0
5     FFF       0      0         0

Expected Output:

 Product        Id  Price    bestID
0     AAA    247610     40     Nope
1     BBB    287950     50     287950
2     CCC    229XYZ      0     Nope
3     DDD    987340    985     987340
4     EEE    111500     34     111500
5     FFF    2345OZ      0     Nope

What are the possible way to fix this problem?

NB: I merged 2 files into 1 dataframe (df1) the first one is with datatype Id int64 and the second one with object types

Upvotes: 0

Views: 113

Answers (1)

Corralien
Corralien

Reputation: 120401

It's easier to convert your 'Id' of df2 from int to str:

df2 = pd.DataFrame(data2).astype({'Id': str})
df1['bestId'] = df1["Id"].isin(df2["Id"])
>>> df1

  Product      Id  Price  bestId
0     AAA  247610     40   False
1     BBB  287950     50    True
2     CCC  229XYZ      0   False
3     DDD  987340    985    True
4     EEE  111500     34    True
5     FFF  2345OZ      0   False

If you want to replace True/False by Id/Nope:

df1['bestId'] = np.where(df1['bestId'], df1['Id'], 'Nope')
>>> df1
  Product      Id  Price  bestId
0     AAA  247610     40    Nope
1     BBB  287950     50  287950
2     CCC  229XYZ      0    Nope
3     DDD  987340    985  987340
4     EEE  111500     34  111500
5     FFF  2345OZ      0    Nope

Upvotes: 1

Related Questions