KansaiRobot
KansaiRobot

Reputation: 9982

Using pytest with dataframes to test specific columns

I am writing pytest tests that use panda's dataframes and I am trying to write the code as general as I can. (I can always check element by element but trying to avoid that)

so I have an input dataframe that contains some ID column like this

ID,othervalue, othervalue2
00001,  4,   3
00001,  3,   3
00001,  2,   0
00003,  5,   2
00003,  2,   1
00003,  2,   9

and I do

def test_df_against_angle(df, angle):
    result = do_some_calculation(df, angle)

Now, result is also a dataframe that contains a ID column and it also contains a decision column that can take a value like "plus", "minus" (or "pass", "fail" or something like that) Something like

ID, someresult,  decision, someotherresult
00001,   4,       plus,       3
00001,   2,       plus,       2
00002,   2,       minus,       2
00002,   1,       minus,       5
00002,   0,       minus,       9

I want to add an assertion (or several) that asserts the following (Not all at once, I mean, different assertions since I have not yet decide which would be better):

  1. All decision values corresponding to an ID are the same
  2. The decision values corresponding to an ID are different than the ones of the other ID
  3. The decision of ID 00001 is plus and the one of 00002 is minus

I know that pandas have some assertion to compare equal dataframes but how can I go for this situation?

Upvotes: 0

Views: 705

Answers (1)

jezrael
jezrael

Reputation: 863166

IIUC use for all tests:

#first test number of unique values per groups if 1
assert df.groupby('ID')['decision'].nunique().eq(1).all()

#second test if match all another groups by group ID
assert not df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision']).any()).any()

#second and third with first 2 unique values per ID
uniq = df['ID'].unique()
s1 = df.loc[df['ID'].eq(uniq[0]), 'decision']
s2 = df.loc[df['ID'].eq(uniq[1]), 'decision']

assert not s1.isin(s2).all() 

#test if all values are plus and minus
assert s1.eq('plus').all() and s2.eq('minus').all()

Testing second condition:

print (df)
      ID someresult decision someotherresult
0  00001          4     plus               3
1  00001          2     plus               2
2  00002          2    minus               2
3  00002          1    minus               5
4  00002          0    minus               9
5  00005          2    minus               2

print(df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision'])))
ID      
00001  2    False
       3    False
       4    False
       5    False
00002  0    False
       1    False
       5     True
00005  0    False
       1    False
       2     True
       3     True
       4     True
Name: decision, dtype: bool()

print(df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision']).any()))
ID
00001    False
00002     True
00005     True
dtype: bool

print(not df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision']).any()).any())
False

And test for True output:

print (df)
      ID someresult decision someotherresult
0  00001          4     plus               3
1  00001          2     plus               2
2  00002          2    minus               2
3  00002          1    minus               5
4  00002          0    minus               9
5  00005          2      yes               2

print(df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision'])))
ID      
00001  2    False
       3    False
       4    False
       5    False
00002  0    False
       1    False
       5    False
00005  0    False
       1    False
       2    False
       3    False
       4    False
Name: decision, dtype: bool()

print(df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision']).any()))
ID
00001    False
00002    False
00005    False
dtype: bool

print(not df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision']).any()).any())
True

Upvotes: 1

Related Questions