RDJ
RDJ

Reputation: 4122

Python method for calculating conditional means and variances?

Is there a standard way in Python to calculate the conditional means and variances of pandas DataFrame variables? The aim is to test the data for over or under dispersion as a prerequisite for assessing whether a Poisson or Negative Binomial model is most suitable for regression.

Scanning around the R ecosystem and Cross Validated, I think R has some packages with built-in parameter dispersion methods. But I can't find a Python equivalent in pandas, SciPy or StatsModels.

This is the head of the data I'm working with. There are 25,000 observations.

aspunet c_#     c_++    Ruby    java
0       0       0       0       6
11      0       0       0       0
0       0       7       0       0
0       0       0       9       0   
8       0       0       0       0
0       2       0       0       0
0       0       0       4       0   
0       0       0       0       6   

Upvotes: 4

Views: 2132

Answers (1)

G.S
G.S

Reputation: 402

conditional = [df.groupby(col_name) for col_name in df.columns]
mean        = [cond.mean() for cond in conditional]
var         = [cond.var() for cond in conditional]

Upvotes: 4

Related Questions