Dropping rows in pandas if column values are duplicated in more than 2 columns

Question

This question is almost what I need, but I cannot adapt it for my needs.

I have a df with a lot of columns, where the last 8 columns are columns with means scores.

Example

  Column1 Column2  Mean1  Mean2  Mean3  Mean4  Mean5  Mean6  Mean7  Mean8
0       A       X     50     50     50     50     50     50     50     50
1       B       Y     20     21     22     23     24     25     26     27
2       C       Z     50     50     50     63     99     54     24     12
3       D       F     40     41     42     43     44     45     46     47

Reprex

{'Column1': {0: 'A', 1: 'B', 2: 'C', 3: 'D'}, 'Column2': {0: 'X', 1: 'Y', 2: 'Z', 3: 'F'}, 'Mean1': {0: 50, 1: 20, 2: 50, 3: 40}, 'Mean2': {0: 50, 1: 21, 2: 50, 3: 41}, 'Mean3': {0: 50, 1: 22, 2: 50, 3: 42}, 'Mean4': {0: 50, 1: 23, 2: 63, 3: 43}, 'Mean5': {0: 50, 1: 24, 2: 99, 3: 44}, 'Mean6': {0: 50, 1: 25, 2: 54, 3: 45}, 'Mean7': {0: 50, 1: 26, 2: 24, 3: 46}, 'Mean8': {0: 50, 1: 27, 2: 12, 3: 47}}

I want to drop all rows in the dataframe, if 3 or more columns in 8 mean columns have the same value.

Expected output (first and third rows were dropped, having value 50 three and more times)

  Column1 Column2  Mean1  Mean2  Mean3  Mean4  Mean5  Mean6  Mean7  Mean8
1       B       Y     20     21     22     23     24     25     26     27
3       D       F     40     41     42     43     44     45     46     47

versatile_programmer · Accepted Answer

n = list()
for number in df.T.columns.tolist():
    if df.T.groupby(number).size().max()>=3:
        n.append(number)
df.drop(n)

Dropping rows in pandas if column values are duplicated in more than 2 columns

Answers (1)

Related Questions