wolverinejohn
wolverinejohn

Reputation: 29

Unit Testing Pandas DataFrame

I'm looking to develop a unit test where it compares two DataFrames and returns True if their lengths are the same and if not returns the difference in length as well as what the missing output rows are.

For instance: Example 1:

df1 = {0,1,2,3,4}
df2 = {0,1,2,3,4}

True

Example 2:

df1 = {0,1,2,3,4}
df2 = {0,2,3,4}

False. 2 is missing.

Notifies me that the second item in df1 is missing from df2.

Is this something that is possible?

Upvotes: 0

Views: 3134

Answers (2)

Bonifacio2
Bonifacio2

Reputation: 3850

I think first you must decide on what you want: either an unit test or a function that returns the difference between two data frames.

If the former case, you could use pd.util.testing.assert_frame_equal:

first = pd.DataFrame(np.arange(16).reshape((4,4)), columns=['A', 'B', 'C', 'D'])
first['A'][0] = 99
second = pd.DataFrame(np.arange(16).reshape((4,4)), columns=['A', 'B', 'C', 'D'])

pd.util.testing.assert_frame_equal(first, second)

and if your DataFrames differ you'll get an assertion error

AssertionError: DataFrame.iloc[:, 0] are different

DataFrame.iloc[:, 0] values are different (25.0 %)
[left]:  [99, 4, 8, 12]
[right]: [0, 4, 8, 12]

In the latter case, if you really want a function to tell you how many lines are missing and what's different from a data frame to the other, then what you are looking for is not an unit test.

Upvotes: 3

Stian
Stian

Reputation: 784

Check out pd.util.testing

For your problem you could do pd.util.testing.assert_equal(df1, df2)

Upvotes: 1

Related Questions