Aleksander Kuś
Aleksander Kuś

Reputation: 47

For loop over dataframe python

i have dataframe called df_civic with columns - state ,rank, make/model, model year, thefts. I want to calculate AVG and STD of thefts for each model year.

All years that are in dataframe are taken with: years_civic = list(pd.unique(df_civic['Model Year']))

My loop looks like this:

for civic_year in years_civic:
    f = df_civic['Model Year'] == civic_year
    civic_avg = df_civic[f]['Thefts'].mean()
    civic_std = df_civic[f]['Thefts'].std()
    civic_std= np.round(car_std,2)
    civic_avg= np.round(car_avg,2)
    print(civic_avg, civic_std, np.sum(f))

However output is not what i need, only output that is correct is the one from np.sum(f)

Now output looks like this:

9.0 20.51 1
9.0 20.51 1
9.0 20.51 1
9.0 20.51 1
9.0 20.51 13
9.0 20.51 15
9.0 20.51 3
9.0 20.51 2

Upvotes: 0

Views: 53

Answers (1)

Nicoowr
Nicoowr

Reputation: 809

Pandas provides many powerful functions for aggregating your data. It's usually better to first think of these functions before using for loops.

For instance, you can use:

import pandas as pd
import numpy as np

df_civic.groupby("Model Year").agg({"theft": ["mean", np.std]})

More doc here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html

Regarding your code, there is something weird, car_std and car_avg are not defined.

Upvotes: 1

Related Questions