Tom M
Tom M

Reputation: 3

Semi Standard Deviation in Pandas

Hi I'm trying to write a function to calculate semi-standard deviation. However I'm struggling to append values less than the average to a new dataframe for the calculation.

def semistand(data,start,end):
    df = data.loc[(str(start)):(str(end))]
    lessthan=pd.DataFrame()
    mean_df= df.mean()
    for ind in df.index:
        if ind in df.index<mean_df:
            lessthan.append(df[index])

    return(mean_df,lessthan)

I'm pretty new to pandas and am finding it to be quite hard to get to grips with!

Upvotes: 0

Views: 3682

Answers (2)

Sajan
Sajan

Reputation: 1267

You need not append values less than mean to the new dataframe. Instead, you could try something like this :

import pandas as pd
values = values = [ 24, 87, 30, 73, 98, 84, 75, 21, 90, 70, 99, 83, 28, 37, 28, 79, 43, 40, 64, 41]
df = pd.DataFrame({'a':values})
df[df['a'] < df['a'].mean()]['a'].std()

7.986099033807293

Upvotes: 2

ysearka
ysearka

Reputation: 3855

The problem in your function is when you're trying to retrieve the index of the wanted values. Indeed you're written: df.index<mean_df which can't work for several reasons:

First, mean_df is a Pandas.Series and contains the means of all the columns of your dataframe. So you can't compare an integer to a Series (since it doesn't really make sense).

Secondly, let's assume that your data was a single column to get rid of the first point. Then you are trying to compare your index to the mean value of your data, which I assume is not your objective. You need to compare the values inside your dataframe.

Here is an example using a pandas series:

my_df = pd.Series([1,3,2,4])
my_df[my_df<my_df.mean()]

0    1
2    2
dtype: int64

Otherwise using a whole dataframe:

my_df = pd.DataFrame()
my_df['a'] = [1,3,2,4]
my_df['b'] = [3,1,4,2]
my_df[my_df < my_df.mean()]

    a       b
0   1.0     NaN
1   NaN     1.0
2   2.0     NaN
3   NaN     2.0

Upvotes: 1

Related Questions