Anakin Skywalker
Anakin Skywalker

Reputation: 2520

Dropping rows in pandas if column values are duplicated in more than 2 columns

This question is almost what I need, but I cannot adapt it for my needs.

I have a df with a lot of columns, where the last 8 columns are columns with means scores.

Example

  Column1 Column2  Mean1  Mean2  Mean3  Mean4  Mean5  Mean6  Mean7  Mean8
0       A       X     50     50     50     50     50     50     50     50
1       B       Y     20     21     22     23     24     25     26     27
2       C       Z     50     50     50     63     99     54     24     12
3       D       F     40     41     42     43     44     45     46     47

Reprex

{'Column1': {0: 'A', 1: 'B', 2: 'C', 3: 'D'}, 'Column2': {0: 'X', 1: 'Y', 2: 'Z', 3: 'F'}, 'Mean1': {0: 50, 1: 20, 2: 50, 3: 40}, 'Mean2': {0: 50, 1: 21, 2: 50, 3: 41}, 'Mean3': {0: 50, 1: 22, 2: 50, 3: 42}, 'Mean4': {0: 50, 1: 23, 2: 63, 3: 43}, 'Mean5': {0: 50, 1: 24, 2: 99, 3: 44}, 'Mean6': {0: 50, 1: 25, 2: 54, 3: 45}, 'Mean7': {0: 50, 1: 26, 2: 24, 3: 46}, 'Mean8': {0: 50, 1: 27, 2: 12, 3: 47}}

I want to drop all rows in the dataframe, if 3 or more columns in 8 mean columns have the same value.

Expected output (first and third rows were dropped, having value 50 three and more times)

  Column1 Column2  Mean1  Mean2  Mean3  Mean4  Mean5  Mean6  Mean7  Mean8
1       B       Y     20     21     22     23     24     25     26     27
3       D       F     40     41     42     43     44     45     46     47

Upvotes: 1

Views: 90

Answers (1)

versatile_programmer
versatile_programmer

Reputation: 235

n = list()
for number in df.T.columns.tolist():
    if df.T.groupby(number).size().max()>=3:
        n.append(number)
df.drop(n)

Upvotes: 1

Related Questions