Reputation: 1
I am trying to remove any columns in my dataframe that do not have one value above .9. I know this probably isn't the most efficient way to do it but I can't find the problem with it. I know it isn't correct because it only removes one column and I know it should be closer to 20. So I do a count to see how many values are below .9 and then if the count equals the length of the list of column values then drop that column. Thanks in advance.
for i in range(len(df3.columns)):
count=0
for j in df3.iloc[:,i].tolist():
if j<.9:
count+=1
if len(df3.iloc[:,i].tolist())==count:
df4=df3.drop(df3.columns[i], axis=1)
df4
Upvotes: 0
Views: 3241
Reputation: 325
You can loop through each column in the dataframe and check the maximum value in each column against your defined threshold, 0.9 in this case, if there are no values more than 0.9, drop the column.
The input:
col1 col2 col3
0 0.2 0.8 1.0
1 0.3 0.5 0.5
Code:
# define dataframe
df = pd.DataFrame({'col1':[0.2, 0.3], 'col2':[0.8, 0.5], 'col3':[1, 0.5]})
# define threshold
threshold = 0.9
# loop through each column in dataframe
for col in df:
# get the maximum value in column
# check if it is less than or equal to the defined threshold
if df[col].max() <= threshold:
# if true, drop the column
df = df.drop([col], axis=1)
This outputs:
col3
0 1.0
1 0.5
Upvotes: 1