Reputation: 29
I'm looking to develop a unit test where it compares two DataFrames and returns True if their lengths are the same and if not returns the difference in length as well as what the missing output rows are.
For instance: Example 1:
df1 = {0,1,2,3,4}
df2 = {0,1,2,3,4}
True
Example 2:
df1 = {0,1,2,3,4}
df2 = {0,2,3,4}
False. 2 is missing.
Notifies me that the second item in df1 is missing from df2.
Is this something that is possible?
Upvotes: 0
Views: 3134
Reputation: 3850
I think first you must decide on what you want: either an unit test or a function that returns the difference between two data frames.
If the former case, you could use pd.util.testing.assert_frame_equal
:
first = pd.DataFrame(np.arange(16).reshape((4,4)), columns=['A', 'B', 'C', 'D'])
first['A'][0] = 99
second = pd.DataFrame(np.arange(16).reshape((4,4)), columns=['A', 'B', 'C', 'D'])
pd.util.testing.assert_frame_equal(first, second)
and if your DataFrame
s differ you'll get an assertion error
AssertionError: DataFrame.iloc[:, 0] are different
DataFrame.iloc[:, 0] values are different (25.0 %)
[left]: [99, 4, 8, 12]
[right]: [0, 4, 8, 12]
In the latter case, if you really want a function to tell you how many lines are missing and what's different from a data frame to the other, then what you are looking for is not an unit test.
Upvotes: 3
Reputation: 784
Check out pd.util.testing
For your problem you could do pd.util.testing.assert_equal(df1, df2)
Upvotes: 1