Bernardo Siv
Bernardo Siv

Reputation: 1

how you use .query with pandas and numpy?

From the file output_hc_v1.csv with headers :emplid pay_status count location deptid grade_desc date version

I would like via python save a new csv filtered on "pay_status=Active". with this code:

a1=a.query ('"pay_status" == ["Active"]')

=========================================

but this part is not working.

My full code:
import csv 
import pandas as pd
import numpy as np

df= pd.read_csv ("C:/Users/e1087353/Documents/CAPCO/HC/python_data/output_hc_v1.csv")
df.head()
a= pd.pivot_table(df, index =["location","deptid","grade_desc","pay_status"] ,values=["Count"],aggfunc=np.sum)
a1=a.query ('"pay_status" == ["Active"]')
a1.to_csv ("C:/Users/e1087353/Documents/CAPCO/HC/python_data/Output_final2.csv")

Upvotes: 0

Views: 1907

Answers (2)

unutbu
unutbu

Reputation: 881037

You do not need to quote column names in the query string. Therefore you could use,

a1 = a.query('pay_status == "Active"')

Upvotes: 1

Nickpick
Nickpick

Reputation: 6597

Don't use query. To filter simply do:

a1=a[a['pay_status']=='Active']

If you create a pivot table you may want to reset the index before you filter: a=a.reset_index()

Alternatively, for more complicated queries, if you want to filter for items in a list use isin:

filter_list=['Active','Inactive','Something']
a1=a[a['pay_status'].isin(filter_list)]

Upvotes: 0

Related Questions