Wocky Bocky
Wocky Bocky

Reputation: 19

Cumulative average in python

I'm working with csv files.

I'd like a to create a continuously updated average of a sequence. ex;

I'd like to output the average of each individual value of a list

list; [a, b, c, d, e, f]
formula:

(a)/1= ?

(a+b)/2=?

(a+b+c)/3=?

(a+b+c+d)/4=?

(a+b+c+d+e)/5=?

(a+b+c+d+e+f)/6=?

To demonstrate:

if i have a list; [1, 4, 7, 4, 19]

my output should be; [1, 2.5, 4, 4, 7]

explained;

(1)/1=1

(1+4)/2=2.5

(1+4+7)/3=4

(1+4+7+4)/4=4

(1+4+7+4+19)/5=7

As far as my python file it is a simple code:

import matplotlib.pyplot as plt

import pandas as pd

df = pd.read_csv('somecsvfile.csv')

x = [] #has to be a list of 1 to however many rows are in the "numbers" column, will be a simple [1, 2, 3, 4, 5] etc...

#x will be used to divide the numbers selected in y to give us z

y = df[numbers]

z = #new dataframe derived from the continuous average of y

plt.plot(x, z)

plt.show()

If numpy is needed that is no problem.

Upvotes: 2

Views: 3140

Answers (3)

Augusto Quaglia
Augusto Quaglia

Reputation: 66

pandas.DataFrame.expanding is what you need.

Using it you can just call df.expanding().mean() to get the result you want:

mean = df.expanding().mean()

print(mean)

Out[10]: 
0   1.0
1   2.5
2   4.0
3   4.0
4   7.0

If you want to do it just in one column, use pandas.Series.expanding.

Just use the column instead of df:

df['column_name'].expanding().mean()

Upvotes: 4

ramzeek
ramzeek

Reputation: 2315

To give a complete answer to your question, filling in the blanks of your code using numpy and plotting:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

#df = pd.read_csv('somecsvfile.csv')
#instead I just create a df with a column named 'numbers'
df = pd.DataFrame([1, 4, 7, 4, 19], columns = ['numbers',])

x = range(1, len(df)+1)  #x will be used to divide the numbers selected in y to give us z

y = df['numbers']
z = np.cumsum(y) / np.array(x)

plt.plot(x, z, 'o')
plt.xticks(x)
plt.xlabel('Entry')
plt.ylabel('Cumulative average')

cumulative-average-plot

But as pointed out by Augusto, you can also just put the whole thing into a DataFrame. Adding a bit more to his approach:

n = [1, 4, 7, 4, 19]
df = pd.DataFrame(n, columns = ['numbers',])
#augment the index so it starts at 1 like you want
df.index = np.arange(1, len(df)+1)

# create a new column for the cumulative average
df = df.assign(cum_avg = df['numbers'].expanding().mean())
#    numbers  cum_avg
# 1        1      1.0
# 2        4      2.5
# 3        7      4.0
# 4        4      4.0
# 5       19      7.0

# plot
df['cum_avg'].plot(linestyle = 'none',
                   marker = 'o',
                   xticks = df.index,
                   xlabel = 'Entry',
                   ylabel = 'Cumulative average')

Upvotes: 0

mujjiga
mujjiga

Reputation: 16866

You can use cumsum to get cumulative sum and then divide to get the running average.

x = np.array([1, 4, 7, 4, 19])
np.cumsum(x)/range(1,len(x)+1)
print (z)

output:

[1.  2.5 4.  4.  7. ]

Upvotes: 0

Related Questions