Steven Pauly
Steven Pauly

Reputation: 185

P-value normal test for multiple rows

I got the following simple code to calculate normality over an array:

import pandas as pd
df = pd.read_excel("directory\file.xlsx")
import numpy as np
x=df.iloc[:,1:].values.flatten()
import scipy.stats as stats
from scipy.stats import normaltest 
stats.normaltest(x,axis=None)

This gives me nicely a p-value and a statistic. The only thing I want right now is to:

Add 2 columns in the file with this p value and statistic and if i have multiple rows, do it for all the rows (calculate p value & statistic for each row and add 2 columns with these values in it).

Can someone help?

Upvotes: 1

Views: 737

Answers (1)

Ben.T
Ben.T

Reputation: 29635

If you want to calculate row-wise normaltest, you should not flatten your data in x and use axis=1 such as

df = pd.DataFrame(np.random.random(105).reshape(5,21)) # to generate data
# calculate normaltest row-wise without the first column like you
df['stat'] ,df['p'] = stats.normaltest(df.iloc[:,1:],axis=1)

Then df contains two columns 'stat' and 'p' with the values your are looking for IIUC.

Note: to be able to perform normaltest, you need at least 8 values (according to what I experienced) so you need at least 8 columns in df.iloc[:,1:] otherwise it will raise an error. And even, it would be better to have more than 20 values in each row.

Upvotes: 1

Related Questions