Python method for calculating conditional means and variances?

Question

Is there a standard way in Python to calculate the conditional means and variances of pandas DataFrame variables? The aim is to test the data for over or under dispersion as a prerequisite for assessing whether a Poisson or Negative Binomial model is most suitable for regression.

Scanning around the R ecosystem and Cross Validated, I think R has some packages with built-in parameter dispersion methods. But I can't find a Python equivalent in pandas, SciPy or StatsModels.

This is the head of the data I'm working with. There are 25,000 observations.

aspunet c_#     c_++    Ruby    java
0       0       0       0       6
11      0       0       0       0
0       0       7       0       0
0       0       0       9       0   
8       0       0       0       0
0       2       0       0       0
0       0       0       4       0   
0       0       0       0       6

G.S · Accepted Answer

conditional = [df.groupby(col_name) for col_name in df.columns]
mean        = [cond.mean() for cond in conditional]
var         = [cond.var() for cond in conditional]

Python method for calculating conditional means and variances?

Answers (1)

Related Questions