Reputation: 45
I have this DataFrame
Date vix2
0 2010-01 [14, 15, 16]
1 2010-02 [19, 20, 22]
2 2010-03 [28, 40, 60]
3 2010-04 [36, 24, 30]
4 2010-05 [34, 25, 22]
5 2010-06 [24, 29, 30]
and I would like to get the median of N rows, where N here is 2, so I would like the result to be something like
Date median
0 2010-01 NaN
1 2010-02 17.5
2 2010-03 25.0
3 2010-04 33.0
4 2010-05 27.5
5 2010-06 27.0
This line of code seems to work but then I can't seem to get past the error
np.median(result['vix2'].values, axis=0)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
================================================================
EDIT:
The following solution worked for me, with the help of BENY and not_speshal
n = 2
result['new'] = pd.Series([np.median(np.concatenate(result['vix2'].iloc[x:x+n].values.tolist())) for x in result.index])
result['new'] = result['new'].shift(n-1)
Upvotes: 1
Views: 348
Reputation: 1052
My solution is a little dirty, but it covers your case fully:
N=2
rolling_index=[]
df.index.to_series().rolling(N).apply(lambda df: rolling_index.append(df.values) or 0, raw=False)
df['values']= [np.NaN]*(N-1)+[df.loc[i,'vix2'].explode().median() for i in rolling_index]
Result N=2:
Date vix2 values
0 1 [14, 15, 16] NaN
1 2 [19, 20, 22] 17.5
2 3 [28, 40, 60] 25.0
3 4 [36, 24, 30] 33.0
4 5 [34, 25, 22] 27.5
5 6 [24, 29, 30] 27.0
Result N=3:
Date vix2 values
0 1 [14, 15, 16] NaN
1 2 [19, 20, 22] NaN
2 3 [28, 40, 60] 20.0
3 4 [36, 24, 30] 28.0
4 5 [34, 25, 22] 30.0
5 6 [24, 29, 30] 29.0
Upvotes: 1
Reputation: 323396
If you only need rolling
number with 2, we can do shift
df['new'] = pd.DataFrame((df['vix2'] + df['vix2'].shift(1)).fillna({0:[]}).tolist()).median(1)
Out[98]:
0 NaN
1 17.5
2 25.0
3 33.0
4 27.5
5 27.0
dtype: float64
More comment way to deal with object
type rolling
n = 2
df.loc[n-1:,'new'] = [np.median(np.ravel(df['vix2'].iloc[x:x+n].values)) for x in df.index][:-n+1]
df
Out[121]:
vix2 new
0 [14, 15, 16] NaN
1 [19, 20, 22] 17.5
2 [28, 40, 60] 25.0
3 [36, 24, 30] 33.0
4 [34, 25, 22] 27.5
5 [24, 29, 30] 27.0
Upvotes: 2
Reputation: 23166
explode
and groupby
N = 2
df["values"] = pd.Series([df["vix2"].iloc[x-N+1:x+1].sum() for x in df.index])
output = df.explode("values").groupby("Date")["values"].median().reset_index()
>>> output
Date values
0 2010-01 NaN
1 2010-02 17.5
2 2010-03 25.0
3 2010-04 33.0
4 2010-05 27.5
5 2010-06 27.0
Upvotes: 1