Reputation: 9982
I am writing pytest tests that use panda's dataframes and I am trying to write the code as general as I can. (I can always check element by element but trying to avoid that)
so I have an input dataframe that contains some ID column like this
ID,othervalue, othervalue2
00001, 4, 3
00001, 3, 3
00001, 2, 0
00003, 5, 2
00003, 2, 1
00003, 2, 9
and I do
def test_df_against_angle(df, angle):
result = do_some_calculation(df, angle)
Now, result
is also a dataframe that contains a ID column and it also contains a decision
column that can take a value like "plus", "minus" (or "pass", "fail" or something like that) Something like
ID, someresult, decision, someotherresult
00001, 4, plus, 3
00001, 2, plus, 2
00002, 2, minus, 2
00002, 1, minus, 5
00002, 0, minus, 9
I want to add an assertion (or several) that asserts the following (Not all at once, I mean, different assertions since I have not yet decide which would be better):
I know that pandas have some assertion to compare equal dataframes but how can I go for this situation?
Upvotes: 0
Views: 705
Reputation: 863166
IIUC use for all tests:
#first test number of unique values per groups if 1
assert df.groupby('ID')['decision'].nunique().eq(1).all()
#second test if match all another groups by group ID
assert not df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision']).any()).any()
#second and third with first 2 unique values per ID
uniq = df['ID'].unique()
s1 = df.loc[df['ID'].eq(uniq[0]), 'decision']
s2 = df.loc[df['ID'].eq(uniq[1]), 'decision']
assert not s1.isin(s2).all()
#test if all values are plus and minus
assert s1.eq('plus').all() and s2.eq('minus').all()
Testing second condition:
print (df)
ID someresult decision someotherresult
0 00001 4 plus 3
1 00001 2 plus 2
2 00002 2 minus 2
3 00002 1 minus 5
4 00002 0 minus 9
5 00005 2 minus 2
print(df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision'])))
ID
00001 2 False
3 False
4 False
5 False
00002 0 False
1 False
5 True
00005 0 False
1 False
2 True
3 True
4 True
Name: decision, dtype: bool()
print(df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision']).any()))
ID
00001 False
00002 True
00005 True
dtype: bool
print(not df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision']).any()).any())
False
And test for True
output:
print (df)
ID someresult decision someotherresult
0 00001 4 plus 3
1 00001 2 plus 2
2 00002 2 minus 2
3 00002 1 minus 5
4 00002 0 minus 9
5 00005 2 yes 2
print(df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision'])))
ID
00001 2 False
3 False
4 False
5 False
00002 0 False
1 False
5 False
00005 0 False
1 False
2 False
3 False
4 False
Name: decision, dtype: bool()
print(df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision']).any()))
ID
00001 False
00002 False
00005 False
dtype: bool
print(not df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision']).any()).any())
True
Upvotes: 1