Reputation: 47
i have dataframe called df_civic
with columns - state ,rank, make/model, model year, thefts
. I want to calculate AVG and STD of thefts
for each model year
.
All years that are in dataframe are taken with: years_civic = list(pd.unique(df_civic['Model Year']))
My loop looks like this:
for civic_year in years_civic:
f = df_civic['Model Year'] == civic_year
civic_avg = df_civic[f]['Thefts'].mean()
civic_std = df_civic[f]['Thefts'].std()
civic_std= np.round(car_std,2)
civic_avg= np.round(car_avg,2)
print(civic_avg, civic_std, np.sum(f))
However output is not what i need, only output that is correct is the one from np.sum(f)
Now output looks like this:
9.0 20.51 1
9.0 20.51 1
9.0 20.51 1
9.0 20.51 1
9.0 20.51 13
9.0 20.51 15
9.0 20.51 3
9.0 20.51 2
Upvotes: 0
Views: 53
Reputation: 809
Pandas provides many powerful functions for aggregating your data. It's usually better to first think of these functions before using for
loops.
For instance, you can use:
import pandas as pd
import numpy as np
df_civic.groupby("Model Year").agg({"theft": ["mean", np.std]})
More doc here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html
Regarding your code, there is something weird, car_std
and car_avg
are not defined.
Upvotes: 1