Kirill Kondratenko
Kirill Kondratenko

Reputation: 377

Calculate the variance for each element in the sample separately

I had DF with name of attraction, date and ride sum.

import pandas as pd

attr = pd.DataFrame(
    {'rides':['circuit','circuit',
              'roller coaster', 'roller coaster',
              'car', 'car', 'car',
              'train', 'train'],
    'date':['2019-06-22', '2019-06-23',
            '2019-06-29', '2019-07-06',
            '2019-09-01', '2019-09-07', '2019-09-08',
            '2019-09-14', '2019-09-15'],
    'ride_sum':[663, 483,
                858, 602,
                326, 2, 86,
                70, 134]})

    rides           date        ride_sum
0   circuit         2019-06-22  663
1   circuit         2019-06-23  483
2   roller coaster  2019-06-29  858
3   roller coaster  2019-07-06  602
4   car             2019-09-01  326
5   car             2019-09-07  2
6   car             2019-09-08  86
7   train           2019-09-14  70
8   train           2019-09-15  134

I can calculate this manually, but my dataframe has more than 1000 lines and more than 30 different rides.

In the example, it looks like this

print(attr.loc[attr['rides'] == 'circuit']['ride_sum'].var(),
      attr.loc[attr['rides'] == 'roller coaster']['ride_sum'].var(),
      attr.loc[attr['rides'] == 'car']['ride_sum'].var(),
      attr.loc[attr['rides'] == 'train']['ride_sum'].var())

16200.0 32768.0 28272.0 2048.0

I want to get a dataframe with a variance for each rides that looks like this

    rides           var
0   circuit         16200.0
1   roller coaster  32768.0
2   car             28272.0
3   train           2048.0

Upvotes: 1

Views: 194

Answers (3)

Do this:

attr.groupby(attr.rides).agg(["var"]).reset_index()

EDIT:

For kurtosis, there is no aggregate. You need to do this:

attr.groupby(attr.rides).apply(pd.DataFrame.kurt).reset_index()

With your example, there are fewer than three values per group, so it'll return NaN.

Upvotes: 2

unique
unique

Reputation: 128

Use the function unique in pandas to take unique rides and apply a loop for to take var Example:

unique_rides = unique(attr['rides'])

for ride in unque_rides:
    print(attr.loc[attr['rides'] == ride]['ride_sum'].var())

Thank you

Upvotes: 0

rftr
rftr

Reputation: 1275

Try groupby together with var() like this:

attr.groupby("rides").var().reset_index()


rides   ride_sum
0   car 28272
1   circuit 16200
2   roller coaster  32768
3   train   2048

(reset_index() is not necessarily required)

Upvotes: 4

Related Questions