Reputation: 480
In my data frame I need to remove columns that contain a specific character. In order to search for those columns, I am trying to write a for loop in python that iterate over each column and, if find a column with the unwanted character, this column has to be dropped out. My data frame appears like this and I need to drop col3 and col5 that have 'f' and 't'
col1 col2 col3 col4 col5 col6
1245 pink f Mar f f
245 green f Feb t f
1237 grey t Apr f f
267 black f Sep t f
I am trying to write a script similar to this
for col in df.items():
if df[col] == 'f'
df = df.drop([col], axis=1)
Upvotes: 1
Views: 371
Reputation: 12417
You can create a boolean mask of the columns which contains only f
and then apply the mask to the df:
mask = ((df == 'f') | (df=='t')).all(0)
df = df[df.columns[~mask]]
If you want to leave column 6, you could do so:
mask0 = ((df == 'f') | (df == 't')).all(0)
mask1 = (df == 'f').all(0)
df0 = df[df.columns[~mask0]]
df1 = df[df.columns[mask1]]
df = pd.concat([df0, df1], axis=1)
Upvotes: 1
Reputation: 92894
With pd.DataFrame.loc
and pd.DataFrame.any
functions:
In [196]: df
Out[196]:
col1 col2 col3 col4 col5
0 1245 pink t Mar f
1 245 green f Feb t
2 1237 grey f Apr f
3 267 black f Sep f
4 111 red t Aug t
In [197]: df.loc[:, ~((df == 'f') | (df == 't')).any(axis=0)]
Out[197]:
col1 col2 col4
0 1245 pink Mar
1 245 green Feb
2 1237 grey Apr
3 267 black Sep
4 111 red Aug
Upvotes: 1