Reputation: 19
I'm working with csv files.
I'd like a to create a continuously updated average of a sequence. ex;
I'd like to output the average of each individual value of a list
list; [a, b, c, d, e, f]
formula:
(a)/1= ?
(a+b)/2=?
(a+b+c)/3=?
(a+b+c+d)/4=?
(a+b+c+d+e)/5=?
(a+b+c+d+e+f)/6=?
To demonstrate:
if i have a list; [1, 4, 7, 4, 19]
my output should be; [1, 2.5, 4, 4, 7]
explained;
(1)/1=1
(1+4)/2=2.5
(1+4+7)/3=4
(1+4+7+4)/4=4
(1+4+7+4+19)/5=7
As far as my python file it is a simple code:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('somecsvfile.csv')
x = [] #has to be a list of 1 to however many rows are in the "numbers" column, will be a simple [1, 2, 3, 4, 5] etc...
#x will be used to divide the numbers selected in y to give us z
y = df[numbers]
z = #new dataframe derived from the continuous average of y
plt.plot(x, z)
plt.show()
If numpy is needed that is no problem.
Upvotes: 2
Views: 3140
Reputation: 66
pandas.DataFrame.expanding
is what you need.
Using it you can just call df.expanding().mean()
to get the result you want:
mean = df.expanding().mean()
print(mean)
Out[10]:
0 1.0
1 2.5
2 4.0
3 4.0
4 7.0
If you want to do it just in one column, use pandas.Series.expanding
.
Just use the column instead of df
:
df['column_name'].expanding().mean()
Upvotes: 4
Reputation: 2315
To give a complete answer to your question, filling in the blanks of your code using numpy
and plotting:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
#df = pd.read_csv('somecsvfile.csv')
#instead I just create a df with a column named 'numbers'
df = pd.DataFrame([1, 4, 7, 4, 19], columns = ['numbers',])
x = range(1, len(df)+1) #x will be used to divide the numbers selected in y to give us z
y = df['numbers']
z = np.cumsum(y) / np.array(x)
plt.plot(x, z, 'o')
plt.xticks(x)
plt.xlabel('Entry')
plt.ylabel('Cumulative average')
But as pointed out by Augusto, you can also just put the whole thing into a DataFrame
. Adding a bit more to his approach:
n = [1, 4, 7, 4, 19]
df = pd.DataFrame(n, columns = ['numbers',])
#augment the index so it starts at 1 like you want
df.index = np.arange(1, len(df)+1)
# create a new column for the cumulative average
df = df.assign(cum_avg = df['numbers'].expanding().mean())
# numbers cum_avg
# 1 1 1.0
# 2 4 2.5
# 3 7 4.0
# 4 4 4.0
# 5 19 7.0
# plot
df['cum_avg'].plot(linestyle = 'none',
marker = 'o',
xticks = df.index,
xlabel = 'Entry',
ylabel = 'Cumulative average')
Upvotes: 0
Reputation: 16866
You can use cumsum
to get cumulative sum and then divide to get the running average.
x = np.array([1, 4, 7, 4, 19])
np.cumsum(x)/range(1,len(x)+1)
print (z)
output:
[1. 2.5 4. 4. 7. ]
Upvotes: 0