OnlyDean
OnlyDean

Reputation: 1039

Identical Dataframes Asserting Not Equal - Python Pandas

I am trying to unit test my code. I have a method that given a MySQL query, returns the result as a pandas dataframe. Note that in the database, all returned values in created and external_id are NULL. Here is the test:

def test_get_data(self):

    ### SET UP

    self.report._query = "SELECT * FROM floor LIMIT 3";
    self.report._columns = ['id', 'facility_id', 'name', 'created', 'modified', 'external_id']
    self.d = {'id': p.Series([1, 2, 3]),
              'facility_id': p.Series([1, 1, 1]),
              'name': p.Series(['1st Floor', '2nd Floor', '3rd Floor']),
              'created': p.Series(['None', 'None', 'None']),
              'modified': p.Series([datetime.strptime('2012-10-06 01:08:27', '%Y-%m-%d %H:%M:%S'),
                                    datetime.strptime('2012-10-06 01:08:27', '%Y-%m-%d %H:%M:%S'),
                                    datetime.strptime('2012-10-06 01:08:27', '%Y-%m-%d %H:%M:%S')]),
              'external_id': p.Series(['None', 'None', 'None'])
              }
    self.df = p.DataFrame(data=self.d, columns=['id', 'facility_id', 'name', 'created', 'modified', 'external_id'])
    self.df.fillna('None')
    print(self.df)
    ### CODE UNDER TEST

    result = self.report.get_data(self.report._cursor_web)
    print(result)
    ### ASSERTIONS

    assert_frame_equal(result, self.df)

Here is the console output (note the print statements in the test code. The manually constructed dataframe is on top, the one derived from the function being tested is on the bottom):

.   id  facility_id       name created            modified external_id
0   1            1  1st Floor    None 2012-10-06 01:08:27        None
1   2            1  2nd Floor    None 2012-10-06 01:08:27        None
2   3            1  3rd Floor    None 2012-10-06 01:08:27        None
   id  facility_id       name created            modified external_id
0   1            1  1st Floor    None 2012-10-06 01:08:27        None
1   2            1  2nd Floor    None 2012-10-06 01:08:27        None
2   3            1  3rd Floor    None 2012-10-06 01:08:27        None
F
======================================================================
FAIL: test_get_data (__main__.ReportTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/path/to/file/ReportsTestCase.py", line 46, in test_get_data
    assert_frame_equal(result, self.df)
   File "/usr/local/lib/python2.7/site-packages/pandas/util/testing.py", line 1313, in assert_frame_equal
obj='DataFrame.iloc[:, {0}]'.format(i))
  File "/usr/local/lib/python2.7/site-packages/pandas/util/testing.py", line 1181, in assert_series_equal
obj='{0}'.format(obj))
  File "pandas/src/testing.pyx", line 59, in pandas._testing.assert_almost_equal (pandas/src/testing.c:4156)
  File "pandas/src/testing.pyx", line 173, in pandas._testing.assert_almost_equal (pandas/src/testing.c:3274)
  File "/usr/local/lib/python2.7/site-packages/pandas/util/testing.py", line 1018, in raise_assert_detail
raise AssertionError(msg)

AssertionError: DataFrame.iloc[:, 3] are different

DataFrame.iloc[:, 3] values are different (100.0 %)
[left]:  [None, None, None]
[right]: [None, None, None]

----------------------------------------------------------------------
Ran 1 test in 0.354s

FAILED (failures=1)

By my reckoning, column 'created' contains three string values of 'None' in both the left and right dataframes. Why is it asserting not equal?

Upvotes: 2

Views: 3246

Answers (1)

user2285236
user2285236

Reputation:

Python also has a built-in constant None that is different from the string 'None'. From the docs:

None

The sole value of the type NoneType. None is frequently used to represent the absence of a value, as when default arguments are not passed to a function. Assignments to None are illegal and raise a SyntaxError.

In the case of comparing None against 'None' (None == 'None') the result will be False. Therefore, assert_frame_equal will raise an AssertionError if one of the DataFrames contains None but the other contains 'None'.

Upvotes: 1

Related Questions