Mean Streets of Silicon Valley
posted at 15:40 on 2009.11.20
Mean/variance calculation is ridiculously commonplace in data analysis, yet most programmers have never seen this gem from TAoCP:
def online_mean_and_variance(data):
n, mu, s2 = 0, 0, 0
for x in data:
n += 1
delta = x - mu
mu += delta/n
s2 += delta*(x - mu)
if n > 1:
yield (mu, s2/(n-1))
def online_mean_and_variance(data):
n, mu, s2 = 0, 0, 0
for x in data:
n += 1
delta = x - mu
mu += delta/n
s2 += delta*(x - mu)
if n > 1:
yield (mu, s2/(n-1))
Unlike the standard two-pass algorithm, this one is online; it also happens to be more stable.
If that's not enough, I've given it to you here as a Python generator. Enjoy!
