Reputation: 10033
I have two samples from the population of neurons in the brain, each sample consisting of a thousand neuron instances, of categories:
Now I'm extracting multiple metrics for each sample using complex network analysis, for example, neuron degree of connectivity k
, a discreet number n = 0, 1, ...., n, or clustering coefficient C
, a continous value between 0.00000 and 1.00000.
df.sample(3)
(where web is category) in my pandas dataframes:
cortex:
web k clustering_coeff
3080 cortex 6.0 0.733333
2951 cortex 11.0 0.428571
1435 cortex 5.0 0.563571
...
cerebellum
815 cerebellum 10.0 0.533333
850 cerebellum 9.0 0.416667
1213 cerebellum 7.0 0.454545
...
How can I use scipy
stats
methods to I compare both metrics in order to know if theres a statistically significant difference between the two gropus?
Assuming a distribution close to Gaussian, but skewed to the right, I'm not sure what is the best approach. Parametric, Non-Parametric, T-test and so on.
Any ideas?
Upvotes: 2
Views: 255
Reputation: 2691
for the "k" metric:
stats.mannwhitneyu(df.loc[df.web=="cortex", "k"], df.loc[df.web=="cerebellum", "k"])
for the "clustering_coeff" metric:
stats.mannwhitneyu(df.loc[df.web=="cortex", "clustering_coeff"], df.loc[df.web=="cerebellum", "clustering_coeff"])
In general use a non-parametric test if you don't know anything about the distribution in exam.
Upvotes: 1