Reputation: 97
I'm trying to compute the standard deviation of a list vr
. The list size is 32, containing an array of size 3980. This array represents a value at a given height
(3980 heights).
First I split the data into 15 minute chunks, where the minutes are given in raytimes
. raytimes
is a list of size 32
as well (containing just the time of the observation, vr
).
I want the standard deviation computed at each height
level, such that I end up with one final array of size 3980
. This happens OK in my code. However my code does not produce the correct standard deviation value when I test it — that is to say the values that are output to w1sd
, w2sd
etc, are not correct (however the array is the correct size: an array of 3980
elements). I assume I am mixing up the wrong indices when computing the standard deviation.
Below are example values from the dataset. All data should fall into w1
and w1sd
as the raytimes
provided in this example are all within 15 minutes (< 0.25). I want to compute the standard deviation of the first element of vr
, that is, the standard deviation of 2.0 + 3.1 + 2.1
, then the second element, or standard deviation of 3.1 + 4.1 + nan
etc.
The result for w1sd
SHOULD BE [0.497, 0.499, 1.0, 7.5]
but instead the code as below gives a nanstd
in w1sd = [0.497, 0.77, 1.31, 5.301]
. Is it something wrong with nanstd
or my indexing?
vr = [
[2.0, 3.1, 4.1, nan],
[3.1, 4.1, nan, 5.1],
[2.1, nan, 6.1, 20.1]
]
Height = [10.0, 20.0, 30.0, 40]
raytimes = [0, 0.1, 0.2]
for j, h in enumerate(Height):
for i, t in enumerate(raytimes):
if raytimes[i] < 0.25:
w1.append(float(vr[i][j]))
elif 0.25 <= raytimes[i] < 0.5:
w2.append(float(vr[i][j]))
elif 0.5 <= raytimes[i] < 0.75:
w3.append(float(vr[i][j]))
else:
w4.append(float(vr[i][j]))
w1sd.append(round(nanstd(w1), 3))
w2sd.append(round(nanstd(w2), 3))
w3sd.append(round(nanstd(w3), 3))
w4sd.append(round(nanstd(w4), 3))
w1 = []
w2 = []
w3 = []
w4 = []
Upvotes: 0
Views: 124
Reputation: 26
I would consider using pandas
for this. It is a library that allows for efficient processing of datasets in numpy
arrays and takes all the looping and indexing out of your hands.
In this case I would define a dataframe
with N_raytimes
rows and N_Height
columns, which would allow to easily slice and aggregate the data any way you like.
This code gives the expected output.
import pandas as pd
import numpy as np
vr = [
[2.0, 3.1, 4.1, np.nan],
[3.1, 4.1, np.nan, 5.1],
[2.1, np.nan, 6.1, 20.1]
]
Height = [10.0, 20.0, 30.0, 40]
raytimes = [0, 0.1, 0.2]
# Define a dataframe with the data
df = pd.DataFrame(vr, columns=Height, index=raytimes)
df.columns.name = "Height"
df.index.name = "raytimes"
# Split it out (this could be more elegant)
w1 = df[df.index < 0.25]
w2 = df[(df.index >= 0.25) & (df.index < 0.5)]
w3 = df[(df.index >= 0.5) & (df.index < 0.75)]
w4 = df[df.index >= 0.75]
# Compute standard deviations
w1sd = w1.std(axis=0, ddof=0).values
w2sd = w2.std(axis=0, ddof=0).values
w3sd = w3.std(axis=0, ddof=0).values
w4sd = w4.std(axis=0, ddof=0).values
Upvotes: 1