code base 5000
code base 5000

Reputation: 4112

compare two pandas data frame

I have two pandas dataframes defined as such:

_data_orig = [
    [1, "Bob", 3.0],
    [2, "Sam", 2.0],
    [3, "Jane", 4.0]
]
_columns = ["ID", "Name", "GPA"]

_data_new = [
        [1, "Bob", 3.2],
        [3, "Jane", 3.9],
        [4, "John", 1.2],
        [5, "Lisa", 2.2]
    ]
_columns = ["ID", "Name", "GPA"]

df1 = pd.DataFrame(data=_data_orig, columns=_columns)
df2 = pd.DataFrame(data=_data_new, columns=_columns)

I need to find the following information:

For operation to find changes in rows, I figured I could look through df2 and check df1, but that seems slow, so I'm hoping to find a faster solution there.

For the other two operations, I really do not know what to do because when I try to compare the two dataframes I get:

ValueError: Can only compare identically-labeled DataFrame objects

Pandas version: '0.16.1'

Suggestions?

Upvotes: 5

Views: 4519

Answers (2)

piRSquared
piRSquared

Reputation: 294526

setup

m = df1.merge(df2, on=['ID', 'Name'], how='outer', suffixes=['', '_'], indicator=True)
m

enter image description here

adds

m.loc[m._merge.eq('right_only')]
or
m.query('_merge == "right_only"')

enter image description here

deletes

m.loc[m._merge.eq('left_only')]
or
m.query('_merge == "left_only"')

enter image description here


0.16.1 answer

setup

m = df1.merge(df2, on=['ID', 'Name'], how='outer', suffixes=['', '_'])
m

enter image description here

adds

m.loc[m.GPA_.notnull() & m.GPA.isnull()]

enter image description here

deletes

m.loc[m.GPA_.isnull() & m.GPA.notnull()]

enter image description here

Upvotes: 6

Steven G
Steven G

Reputation: 17152

doing this:

df1.set_index(['Name','ID'])-df2.set_index(['Name','ID'])
Out[108]: 
            GPA
Name ID        
Bob  1  -0.2000
Jane 3   0.1000
John 4      nan
Lisa 5      nan
Sam  2      nan

would allow you to screen if there is difference between df1 and df2. NaN would represent values that does not intersect

Upvotes: 0

Related Questions