Reputation: 2685
I am totally new to python . I have two data-frames which are of the same dataset but one is input and one is the output.
So, Here is my input dataframe
Document_ID OFFSET PredictedFeature
0 0 2000
0 8 2000
0 16 2200
0 23 2200
0 30 2200
1 0 2100
1 5 2100
1 7 2100
SO Here I am giving this as an input to my ml-model
. It gives me an output in the this format only .
Now my output looks like ,
Document_ID OFFSET PredictedFeature
0 0 2000
0 8 2000
0 16 2100
0 23 2100
0 30 2200
1 0 2000
1 5 2000
1 7 2100
Now, In this two data-frames what I am trying to do is that
for that Id, for that OFFSET the input feature is same as that of output feature . if It is then I want to add true as a value in the new column if it is not then it will add false value.
Now, If we see in the example data
for ID 0 , for offset 16 the input feature is 2200 and output feature is 2100 so it is a false.
Can any one please help me with this ? Any thing will be helpful.
Upvotes: 2
Views: 76
Reputation: 1
concat
>>> df = pd.concat([df1, df2])
>>> df = df.reset_index(drop=True)
group by
>>> df_gpby = df.groupby(list(df.columns))
get index of unique records
>>> idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]
filter
>>> df.reindex(idx)
Date Fruit Num Color
9 2013-11-25 Orange 8.6 Orange
8 2013-11-25 Apple 22.1 Red
use this method you can find out the different datas by index value, you can add new column for this index value only false another values are true
Upvotes: 0
Reputation: 862511
If there are same index values between both DataFrame
s and also same values in first 2 columns use:
inputdf['new'] = inputdf['PredictedFeature'] == outputdf['PredictedFeature']
Upvotes: 1