Python, Pandas to remove rows with specific value in designated columns

Question

An Excel spreadsheet looked like below.

With Pandas, I want to remove the columns “Project C” and “Project E”, and all rows with value “XX” in Columns “Project A” and “Project D”.

import pandas as pd
import numpy as np

work_file = "C:\test.xlsx"

df = pd.read_excel(work_file, sheetname = "Sheet1", index_col = 0)

column_list_to_remove = [“Project C", “Project E"]

results1 = df.drop(column_list_to_remove, axis=1)

writer = pd.ExcelWriter("C:\test new.xlsx")

pd.formats.format.header_style = None

results1.to_excel(writer,'Sheet1')

writer.save()

Above work well in removing the columns.

I try added on this line to remove rows with value “XX” in “Project A”, and failed. Without hope to remove those in “Project D”.

results1 = results1[results1."Project A" != "XX"]
# SyntaxError: invalid syntax

How can I remove those rows? Thank you.

Uvar · Accepted Answer

If your column names did not have whitespace, you could have done something along the lines of:

results1 = results1.query("ColumnName != 'XX'")

or alternatively inplace:

results1.query("ColumnName != 'XX'", inplace=True)

Alas, they do. Now you can either rename those columns, as whitespace is evil incarnate, or do:

results1 = results1[results1['Project A'] != 'XX']

Basically: create a mask for the index and slice the dataframe.

Renaming your columns without the whitespace can be done in a oneliner as well:

results1.columns = [''.join(x.strip().split()) for x in results1.columns]

Then you can proceed with something like:

column_list = ['ProjectA', 'ProjectD']
for col in column_list:
    results1.query(col+"!='XX'", inplace=True)

Python, Pandas to remove rows with specific value in designated columns

Answers (1)

Related Questions