ganesh kaspate
ganesh kaspate

Reputation: 2685

Compare the two data-frames columns on the basis of id's in the dataframe

I am totally new to python . I have two data-frames which are of the same dataset but one is input and one is the output.

So, Here is my input dataframe

Document_ID OFFSET  PredictedFeature
    0         0            2000
    0         8            2000
    0         16           2200
    0         23           2200
    0         30           2200
    1          0            2100
    1          5            2100
    1          7            2100

SO Here I am giving this as an input to my ml-model. It gives me an output in the this format only .

Now my output looks like ,

  Document_ID    OFFSET   PredictedFeature
        0         0            2000
        0         8            2000
        0         16           2100
        0         23           2100
        0         30           2200
        1          0           2000
        1          5           2000
        1          7           2100

Now, In this two data-frames what I am trying to do is that

for that Id, for that OFFSET the input feature is same as that of output feature . if It is then I want to add true as a value in the new column if it is not then it will add false value.

Now, If we see in the example data

for ID 0 , for offset 16 the input feature is 2200 and output feature is 2100 so it is a false.

Can any one please help me with this ? Any thing will be helpful.

Upvotes: 2

Views: 76

Answers (2)

soundaraj
soundaraj

Reputation: 1

concat

>>> df = pd.concat([df1, df2])
>>> df = df.reset_index(drop=True)

group by

 >>> df_gpby = df.groupby(list(df.columns))

get index of unique records

>>> idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]

filter

>>> df.reindex(idx)
         Date   Fruit   Num   Color
9  2013-11-25  Orange   8.6  Orange
8  2013-11-25   Apple  22.1     Red

use this method you can find out the different datas by index value, you can add new column for this index value only false another values are true

Upvotes: 0

jezrael
jezrael

Reputation: 862511

If there are same index values between both DataFrames and also same values in first 2 columns use:

inputdf['new'] = inputdf['PredictedFeature'] == outputdf['PredictedFeature']

Upvotes: 1

Related Questions