Reputation: 14415
I am building some unit tests of data and am having trouble writing a pythonic check of data.
I have a pandas DataFrame
:
d = {'one' : pd.Series([.14, .52, 1.], index=['a', 'b', 'c']),
'two' : pd.Series([.57, .25, .33, .98], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
Now, I want to verify that these columns have data that falls within the range [0,1]. I'd want a function:
check_data(df, column)
that just returns True
if the data does fall in the range and False
if it doesn't. So in my example data, check_data(df, 'one')
returns False
, check_data(df, 'two')
returns True
.
My head is trying to take on a row by row approach (thank my years of Excel VBA), but I know that's wrong. Anyone got a better approach?
Upvotes: 2
Views: 1735
Reputation: 176810
You could use between
and all
to check individual columns:
>>> df['one'].between(0, 1).all()
False
>>> df['two'].between(0, 1).all()
True
between
includes the endpoints by default; to change this set inclusive=False
.
You could also check every column of the DataFrame at once if you wished:
>>> ((0 <= df) & (df <= 1)).all()
one False
two True
dtype: bool
Upvotes: 6