Jenny James
Jenny James

Reputation: 1

How to remove columns that have all values below a certain threshold

I am trying to remove any columns in my dataframe that do not have one value above .9. I know this probably isn't the most efficient way to do it but I can't find the problem with it. I know it isn't correct because it only removes one column and I know it should be closer to 20. So I do a count to see how many values are below .9 and then if the count equals the length of the list of column values then drop that column. Thanks in advance.

for i in range(len(df3.columns)):
    count=0
    for j in df3.iloc[:,i].tolist():
        if j<.9:
            count+=1
    
    if len(df3.iloc[:,i].tolist())==count:
        df4=df3.drop(df3.columns[i], axis=1)
df4

Upvotes: 0

Views: 3241

Answers (1)

greco
greco

Reputation: 325

You can loop through each column in the dataframe and check the maximum value in each column against your defined threshold, 0.9 in this case, if there are no values more than 0.9, drop the column.

The input:

    col1    col2    col3
0   0.2     0.8     1.0
1   0.3     0.5     0.5

Code:

# define dataframe
df = pd.DataFrame({'col1':[0.2, 0.3], 'col2':[0.8, 0.5], 'col3':[1, 0.5]})
# define threshold
threshold = 0.9

# loop through each column in dataframe
for col in df:
    # get the maximum value in column
    # check if it is less than or equal to the defined threshold
    if df[col].max() <= threshold:
        # if true, drop the column
        df = df.drop([col], axis=1)

This outputs:

    col3
0   1.0
1   0.5

Upvotes: 1

Related Questions