Reputation: 466

Remove rows having same value in all columns

I have a pandas dataframe that I'm trying to drop rows based on all columns having exact same value. Here's an example to help understand the idea.

Input:

index  A  B  C  D  E  F ....
 0     1  2  3  1  3  4
 1     2  2  2  2  2  2
 2     5  5  5  5  5  5 
 3     7  7  6  7  7  7

Output:

index  A  B  C  D  E  F ....
 0     1  2  3  1  3  4
 3     7  7  6  7  7  7

There can be many columns here.

Upvotes: 5

Answers (3)

user2285236

Reputation:

An efficient way of doing this with numeric DataFrames is to use the standard deviation (which will be 0 only if all values are the same):

df[df.std(axis=1) > 0]
Out: 
   A  B  C  D  E  F
0  1  2  3  1  3  4
3  7  7  6  7  7  7

As tgrandje points out, due to floating point inaccuracy the standard deviation may not be exactly zero. You can instead use np.isclose for a more robust approach:

df[~np.isclose(df.std(axis=1), 0)]

which results in the same answer.

Timings with 40k rows:

%timeit df[df.std(axis=1) > 0]
1000 loops, best of 3: 1.69 ms per loop

%timeit df[df.nunique(1) > 1]
1 loop, best of 3: 2.62 s per loop

Upvotes: 12

BENY

Reputation: 323266

Using nunique

df=df[df.nunique(1)>1]
df
Out[286]: 
       A  B  C  D  E  F
index                  
0      1  2  3  1  3  4
3      7  7  6  7  7  7

Upvotes: 5

MaxU - stand with Ukraine

Reputation: 210842

Yet another efficient (well not that fast as @ayhan's solution) way:

In [17]: df[~df.eq(df.iloc[:, 0], axis=0).all(1)]
Out[17]:
       A  B  C  D  E  F
index
0      1  2  3  1  3  4
3      7  7  6  7  7  7

Timing for 40.000 rows DF:

In [19]: df.shape
Out[19]: (40000, 6)

In [20]: %timeit df[df.std(axis=1) > 0]
5.62 ms ± 162 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [21]: %timeit df[df.nunique(1)>1]
9.87 s ± 104 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [23]: %timeit df[~df.eq(df.iloc[:, 0], axis=0).all(1)]
13 ms ± 86.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 3

Remove rows having same value in all columns

Answers (3)

Related Questions