Reputation: 672
I have 2 dataframes
data1 = {'Product': ['AAA','BBB','CCC','DDD','EEE','FFF'],
'Id': ['247610','287950','229XYZ','987340','111500','2345OZ'],
'Price':[40,50,0,985,34,0]}
data2 = {'Product': ['AAA','BBB','CCC','DDD','EEE'],
'Id': [508760,287950,678897,987340,111500],}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df1 as shown below:
Product Id Price
0 AAA 247610 40
1 BBB 287950 50
2 CCC 229XYZ 0
3 DDD 987340 985
4 EEE 111500 34
5 FFF 2345OZ 0
with this data types (df1 dtype):
Product object
Id object
Price int64
I want to compare it with another dataframe Df2 to get the best id. df2:
Product Id
0 AAA 508760
1 BBB 287950
2 CCC 678897
3 DDD 987340
4 EEE 111500
With data type (df2.dtypes):
Product object
Id int64
As you can see i cannot convert the id of my first df1 to int so i converted to float so that i can compare with the id of df2
df1['Id'] = pd.to_numeric(df1['Id'],errors='coerce')
df1["bestID"] = (df1["Id"].isin(df2["Id"]) * df1["Id"]).replace(0, "Nope")
print(df1)
I got the output below:
Product Id Price bestID
0 AAA 247610.0 40 Nope
1 BBB 287950.0 50 287950.0
2 CCC NaN 0 NaN
3 DDD 987340.0 985 987340.0
4 EEE 111500.0 34 111500.0
5 FFF NaN 0 NaN
The problem here is that
i need the id type to be in int i want to remove .0
from the id value: i can use
df1 = df1.replace(np.nan, 0, regex=True) df1['Id'] = df1['Id'].astype(int)
But the problem here that i got the df1 below(which is not what what i need) :
Product Id Price bestID
0 AAA 247610 40 Nope
1 BBB 287950 50 287950.0
2 CCC 0 0 0
3 DDD 987340 985 987340.0
4 EEE 111500 34 111500.0
5 FFF 0 0 0
Nan => Nope
.229XYZ
& 2345OZ
( i dont want the value to be 0 or nan)229XYZ
& 2345OZ
(if there is no solution).Expected Output:
Product Id Price bestID
0 AAA 247610 40 Nope
1 BBB 287950 50 287950
2 CCC 229XYZ 0 Nope
3 DDD 987340 985 987340
4 EEE 111500 34 111500
5 FFF 2345OZ 0 Nope
What are the possible way to fix this problem?
NB: I merged 2 files into 1 dataframe (df1) the first one is with datatype Id int64 and the second one with object types
Upvotes: 0
Views: 113
Reputation: 120401
It's easier to convert your 'Id' of df2 from int to str:
df2 = pd.DataFrame(data2).astype({'Id': str})
df1['bestId'] = df1["Id"].isin(df2["Id"])
>>> df1
Product Id Price bestId
0 AAA 247610 40 False
1 BBB 287950 50 True
2 CCC 229XYZ 0 False
3 DDD 987340 985 True
4 EEE 111500 34 True
5 FFF 2345OZ 0 False
If you want to replace True/False by Id/Nope:
df1['bestId'] = np.where(df1['bestId'], df1['Id'], 'Nope')
>>> df1
Product Id Price bestId
0 AAA 247610 40 Nope
1 BBB 287950 50 287950
2 CCC 229XYZ 0 Nope
3 DDD 987340 985 987340
4 EEE 111500 34 111500
5 FFF 2345OZ 0 Nope
Upvotes: 1