guitarman
guitarman

Reputation: 175

Need to Remove Column based on Row value in pandas

I'm trying to remove columns that have 0 in the 2nd row of the following dataframe snippet(there are many more columns than this however):

   1st Year Gender CDK6  1st Year Gender GBP1  1st Year Gender LY9  Future All CCDC144B
0                     1                     1                    1                    0
1                     0                     1                    0                    1

I simply need to remove the columns where the 2nd row has a 0 in it. The result will be:

   1st Year Gender GBP1   Future All CCDC144B
0                    1                   0
1                    1                   1

I have code here that gets the column names and then I attempt to drop them, however I am getting a key error.

drop_columns = []
for x in percent_scoring:
    if percent_scoring[x][1] == 0:
        drop_columns.append(x)

for x in drop_columns:
    percent_scoring = percent_scoring.drop(columns=x)

but I get an unexpected key error

KeyError: "['1st Year All CDK6', '1st Year Gender CDK6', '1st Year Gender LY9'] not in index"

Not sure why the key error, but an easy way to do this would be appreciated. I couldn't find any info on this task which seems to be simple. Thanks

Upvotes: 1

Views: 1153

Answers (3)

bkeesey
bkeesey

Reputation: 496

I would use loc and iloc to just select all columns that do not have a 0 value in the second row.

# Create dummy DataFrame
d = {'col1': [0, 2], 'col2': [3, 0], 'col4': [3, 1], 'col5': [0, 0]}
df = pd.DataFrame(data=d)

   col1  col2  col4  col5
0     0     3     3     0
1     2     0     1     0

# Select all columns where the second row doesn't equal 0
new_df = df.loc[:,~(df.iloc[1]==0)]
print(new_df)

   col1  col4
0     0     3
1     2     1

Upvotes: 1

S_Koen
S_Koen

Reputation: 36

I just tried your code and got no error. Maybe compare my results with yours:

import pandas as pd

d = {'1st Year Gender CDK6': [1, 0], '1st Year Gender GBP1': [1, 1], '1st Year Gender LY9': [1, 0], 'Future All CCDC144B': [0, 1]}
df = pd.DataFrame(data=d)

drop_columns = []
for x in df:
    if df[x][1] == 0:
        drop_columns.append(x)

for x in drop_columns:
    df = df.drop(columns=x)

df first:

1st Year Gender CDK6    1st Year Gender GBP1    1st Year Gender LY9 Future All CCDC144B
0                  1                       1                      1                   0
1                  0                       1                      0                   1

after:

1st Year Gender GBP1    Future All CCDC144B
0                  1                      0
1                  1                      1

Upvotes: 0

Sherar MDP
Sherar MDP

Reputation: 321

Instead of

percent_scoring = percent_scoring.drop(columns=x)

try:

del percent_scoring[x]

Upvotes: 0

Related Questions