Reputation: 789

Pandas - check if dataframe has negative value in any column

I wonder how to check if a pandas dataframe has negative value in 1 or more columns and return only boolean value (True or False). Can you please help?

In[1]: df = pd.DataFrame(np.random.randn(10, 3))

In[2]: df
Out[2]:
          0         1         2
0 -1.783811  0.736010  0.865427
1 -1.243160  0.255592  1.670268
2  0.820835  0.246249  0.288464
3 -0.923907 -0.199402  0.090250
4 -1.575614 -1.141441  0.689282
5 -1.051722  0.513397  1.471071
6  2.549089  0.977407  0.686614
7 -1.417064  0.181957  0.351824
8  0.643760  0.867286  1.166715
9 -0.316672 -0.647559  1.331545

Expected output:-

Out[3]: True

Upvotes: 12

Answers (3)

billjoie

Reputation: 968

Actually, if speed is important, I did a few tests:

df = pd.DataFrame(np.random.randn(10000, 30000))

Test 1, slowest: pure pandas

(df < 0).any().any()
# 303 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Test 2, faster: switching over to numpy with .values for testing the presence of a True entry

(df < 0).values.any()
# 269 ms ± 8.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Test 3, maybe even faster, though not significant: switching over to numpy for the whole thing

(df.values < 0).any()
# 267 ms ± 1.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 18

billjoie

Reputation: 968

This does the trick:

(df < 0).any().any()

To break it down, (df < 0) gives a dataframe with boolean entries. Then the first .any() returns a series of booleans, testing within each column for the presence of a True value. And then, the second .any() asks whether this returned series itself contains any True value.

This returns a simple:

True

Upvotes: 6

BENY

Reputation: 323226

You can chain two any

df.lt(0).any().any()
Out[96]: True

Upvotes: 5

Pandas - check if dataframe has negative value in any column

Answers (3)

Related Questions