Reputation: 2520
This question is almost what I need, but I cannot adapt it for my needs.
I have a df with a lot of columns, where the last 8 columns are columns with means scores.
Example
Column1 Column2 Mean1 Mean2 Mean3 Mean4 Mean5 Mean6 Mean7 Mean8
0 A X 50 50 50 50 50 50 50 50
1 B Y 20 21 22 23 24 25 26 27
2 C Z 50 50 50 63 99 54 24 12
3 D F 40 41 42 43 44 45 46 47
Reprex
{'Column1': {0: 'A', 1: 'B', 2: 'C', 3: 'D'}, 'Column2': {0: 'X', 1: 'Y', 2: 'Z', 3: 'F'}, 'Mean1': {0: 50, 1: 20, 2: 50, 3: 40}, 'Mean2': {0: 50, 1: 21, 2: 50, 3: 41}, 'Mean3': {0: 50, 1: 22, 2: 50, 3: 42}, 'Mean4': {0: 50, 1: 23, 2: 63, 3: 43}, 'Mean5': {0: 50, 1: 24, 2: 99, 3: 44}, 'Mean6': {0: 50, 1: 25, 2: 54, 3: 45}, 'Mean7': {0: 50, 1: 26, 2: 24, 3: 46}, 'Mean8': {0: 50, 1: 27, 2: 12, 3: 47}}
I want to drop all rows in the dataframe, if 3 or more columns in 8 mean columns have the same value.
Expected output (first and third rows were dropped, having value 50 three and more times)
Column1 Column2 Mean1 Mean2 Mean3 Mean4 Mean5 Mean6 Mean7 Mean8
1 B Y 20 21 22 23 24 25 26 27
3 D F 40 41 42 43 44 45 46 47
Upvotes: 1
Views: 90
Reputation: 235
n = list()
for number in df.T.columns.tolist():
if df.T.groupby(number).size().max()>=3:
n.append(number)
df.drop(n)
Upvotes: 1