Reputation: 377
I had DF with name of attraction, date and ride sum.
import pandas as pd
attr = pd.DataFrame(
{'rides':['circuit','circuit',
'roller coaster', 'roller coaster',
'car', 'car', 'car',
'train', 'train'],
'date':['2019-06-22', '2019-06-23',
'2019-06-29', '2019-07-06',
'2019-09-01', '2019-09-07', '2019-09-08',
'2019-09-14', '2019-09-15'],
'ride_sum':[663, 483,
858, 602,
326, 2, 86,
70, 134]})
rides date ride_sum
0 circuit 2019-06-22 663
1 circuit 2019-06-23 483
2 roller coaster 2019-06-29 858
3 roller coaster 2019-07-06 602
4 car 2019-09-01 326
5 car 2019-09-07 2
6 car 2019-09-08 86
7 train 2019-09-14 70
8 train 2019-09-15 134
I can calculate this manually, but my dataframe has more than 1000 lines and more than 30 different rides.
In the example, it looks like this
print(attr.loc[attr['rides'] == 'circuit']['ride_sum'].var(),
attr.loc[attr['rides'] == 'roller coaster']['ride_sum'].var(),
attr.loc[attr['rides'] == 'car']['ride_sum'].var(),
attr.loc[attr['rides'] == 'train']['ride_sum'].var())
16200.0 32768.0 28272.0 2048.0
I want to get a dataframe with a variance for each rides that looks like this
rides var
0 circuit 16200.0
1 roller coaster 32768.0
2 car 28272.0
3 train 2048.0
Upvotes: 1
Views: 194
Reputation: 11489
Do this:
attr.groupby(attr.rides).agg(["var"]).reset_index()
EDIT:
For kurtosis, there is no aggregate. You need to do this:
attr.groupby(attr.rides).apply(pd.DataFrame.kurt).reset_index()
With your example, there are fewer than three values per group, so it'll return NaN
.
Upvotes: 2
Reputation: 128
Use the function unique in pandas to take unique rides and apply a loop for to take var Example:
unique_rides = unique(attr['rides'])
for ride in unque_rides:
print(attr.loc[attr['rides'] == ride]['ride_sum'].var())
Thank you
Upvotes: 0
Reputation: 1275
Try groupby
together with var()
like this:
attr.groupby("rides").var().reset_index()
rides ride_sum
0 car 28272
1 circuit 16200
2 roller coaster 32768
3 train 2048
(reset_index()
is not necessarily required)
Upvotes: 4