Reputation: 189
I am trying to find a more pandorable way to get all rows of a DataFrame past a certain value in the a certain column (the Quarter
column in this case).
I want to slice a DataFrame of GDP statistics to get all rows past the first quarter of 2000 (2000q1
). Currently, I'm doing this by getting the index number of the value in the GDP_df["Quarter"]
column that equals 2000q1
(see below). This seems way too convoluted and there must be an easier, simpler, more idiomatic way to achieve this. Any ideas?
Current Method:
def get_GDP_df():
GDP_df = pd.read_excel(
"gdplev.xls",
names=["Quarter", "GDP in 2009 dollars"],
parse_cols = "E,G", skiprows = 7)
year_2000 = GDP_df.index[GDP_df["Quarter"] == '2000q1'].tolist()[0]
GDP_df["Growth"] = (GDP_df["GDP in 2009 dollars"]
.pct_change()
.apply(lambda x: f"{round((x * 100), 2)}%"))
GDP_df = GDP_df[year_2000:]
return GDP_df
Output:
Also, after the DataFrame has been sliced, the indices now start at 212. Is there a method to renumber the indices so they start at 0 or 1?
Upvotes: 3
Views: 244
Reputation: 2490
As pointed in the comments you can use the new awesome method query() that
Query the columns of a DataFrame with a boolean expression that uses the top-level pandas.eval() function to evaluate the passed query
with pandas.eval method thatEvaluate a Python expression as a string using various backends
that uses only Python expressions.
import pandas as pd
raw_data = {'ID':['101','101','101','102','102','102','102','103','103','103','103'],
'Week':['08-02-2000','09-02-2000','11-02-2000','10-02-2000','09-02-2000','08-02-2000','07-02-2000','01-02-2000',
'02-02-2000','03-02-2000','04-02-2000'],
'Quarter':['2000q1','2000q2','2000q3','2000q4','2000q1','2000q2','2000q3','2000q4','2000q1','2000q2','2000q3'],
'GDP in 2000 dollars':[15,15,10,15,15,5,10,10,15,20,11]}
def get_GDP_df():
GDP_df = pd.DataFrame(raw_data).set_index('ID')
print(GDP_df) # for reference to see how the data is indexed, printing out to the screen
GDP_df = GDP_df.query("Quarter >= '2000q1'").reset_index(drop=True) #performing the query() + reindexing the dataframe
GDP_df["Growth"] = (GDP_df["GDP in 2000 dollars"]
.pct_change()
.apply(lambda x: f"{round((x * 100), 2)}%"))
return GDP_df
get_GDP_df()
Upvotes: 1
Reputation: 375375
The following is equivalent:
year_2000 = (GDP_df["Quarter"] == '2000q1').idxmax()
GDP_df["Growth"] = (GDP_df["GDP in 2009 dollars"]
.pct_change()
.mul(100)
.round(2)
.apply(lambda x: f"{x}%"))
return GDP_df.loc[year_2000:]
Upvotes: 1