Find average of row and column groups pandas

Question

I want to find the states with the highest average total revenue and be able to see states with the 40-45th highest average, 35-40th, etc for all states from 1992-2016.

Data is organized in a dataframe in the below picture. So ideally I could have another column like the following. I think this is what I am trying to do.

STATE // YEAR // TOTAL_REVENUE // AVG_TOTAL_REVENUE

ALABAMA // 1992 // 5000 // 6059

ALABAMA // 1993 // 4000 // 6059

ALASKA // 1992 // 3000 // 2059

ALABAMA // 1996 // 6019 // 6059

Is this possible to do? I am not sure if I am stating what I want to do correctly and not sure what I am looking for google wise to figure out a way forward.

Anna Nevison · Accepted Answer

Assuming your input looks like:

STATE       YEAR    TOTAL_REVENUE
Michigan    2001    1000
Michigan    2002    2000
California  2003    3000
California  2004    4000
Michigan    2005    5000

Then just do:

df['AVG_TOTAL_REVENUE'] = np.nan

states = df['STATE'].tolist()
states = list(set(states))

for state in states:
    state_values = df[df['STATE'] == state]
    revenues = state_values['TOTAL_REVENUE'].tolist()
    revenues = [float(x) for x in revenues]
    avg = sum(revenues)/len(revenues)
    df['AVG_TOTAL_REVENUE'].loc[state_values.index] = avg

which gives you:

     STATE     YEAR       TOTAL_REVENUE  AVG_TOTAL_REVENUE
0    Michigan  2001           1000        2666.666667
1    Michigan  2002           2000        2666.666667
2  California  2003           3000        3500.000000
3  California  2004           4000        3500.000000
4    Michigan  2005           5000        2666.666667

Find average of row and column groups pandas

Answers (2)

Related Questions