lurence
lurence

Reputation: 45

Can I get the median of the values accross multiple rows in pandas dataframe?

I have this DataFrame

      Date          vix2
0  2010-01  [14, 15, 16]
1  2010-02  [19, 20, 22]
2  2010-03  [28, 40, 60]
3  2010-04  [36, 24, 30]
4  2010-05  [34, 25, 22]
5  2010-06  [24, 29, 30]

and I would like to get the median of N rows, where N here is 2, so I would like the result to be something like

      Date  median
0  2010-01     NaN
1  2010-02    17.5
2  2010-03    25.0
3  2010-04    33.0
4  2010-05    27.5
5  2010-06    27.0

This line of code seems to work but then I can't seem to get past the error

np.median(result['vix2'].values, axis=0)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

================================================================

EDIT:

The following solution worked for me, with the help of BENY and not_speshal

n = 2
result['new'] = pd.Series([np.median(np.concatenate(result['vix2'].iloc[x:x+n].values.tolist())) for x in result.index])
result['new'] = result['new'].shift(n-1)

Upvotes: 1

Views: 348

Answers (3)

Pab
Pab

Reputation: 1052

My solution is a little dirty, but it covers your case fully:

N=2
rolling_index=[]
df.index.to_series().rolling(N).apply(lambda df: rolling_index.append(df.values) or 0, raw=False)
df['values']= [np.NaN]*(N-1)+[df.loc[i,'vix2'].explode().median() for i in rolling_index]

Result N=2:

    Date    vix2    values
0   1   [14, 15, 16]    NaN
1   2   [19, 20, 22]    17.5
2   3   [28, 40, 60]    25.0
3   4   [36, 24, 30]    33.0
4   5   [34, 25, 22]    27.5
5   6   [24, 29, 30]    27.0

Result N=3:

    Date    vix2    values
0   1   [14, 15, 16]    NaN
1   2   [19, 20, 22]    NaN
2   3   [28, 40, 60]    20.0
3   4   [36, 24, 30]    28.0
4   5   [34, 25, 22]    30.0
5   6   [24, 29, 30]    29.0

Upvotes: 1

BENY
BENY

Reputation: 323396

If you only need rolling number with 2, we can do shift

df['new'] = pd.DataFrame((df['vix2'] + df['vix2'].shift(1)).fillna({0:[]}).tolist()).median(1)
Out[98]: 
0     NaN
1    17.5
2    25.0
3    33.0
4    27.5
5    27.0
dtype: float64

More comment way to deal with object type rolling

n = 2
df.loc[n-1:,'new'] = [np.median(np.ravel(df['vix2'].iloc[x:x+n].values)) for x in df.index][:-n+1]
df
Out[121]: 
           vix2   new
0  [14, 15, 16]   NaN
1  [19, 20, 22]  17.5
2  [28, 40, 60]  25.0
3  [36, 24, 30]  33.0
4  [34, 25, 22]  27.5
5  [24, 29, 30]  27.0

Upvotes: 2

not_speshal
not_speshal

Reputation: 23166

  1. Combine values for every N (=2) rows into a single row.
  2. Use explode and groupby
N = 2
df["values"] = pd.Series([df["vix2"].iloc[x-N+1:x+1].sum() for x in df.index])
output = df.explode("values").groupby("Date")["values"].median().reset_index()

>>> output
      Date  values
0  2010-01     NaN
1  2010-02    17.5
2  2010-03    25.0
3  2010-04    33.0
4  2010-05    27.5
5  2010-06    27.0

Upvotes: 1

Related Questions