Reputation: 185
I got the following simple code to calculate normality over an array:
import pandas as pd
df = pd.read_excel("directory\file.xlsx")
import numpy as np
x=df.iloc[:,1:].values.flatten()
import scipy.stats as stats
from scipy.stats import normaltest
stats.normaltest(x,axis=None)
This gives me nicely a p-value and a statistic. The only thing I want right now is to:
Add 2 columns in the file with this p value and statistic and if i have multiple rows, do it for all the rows (calculate p value & statistic for each row and add 2 columns with these values in it).
Can someone help?
Upvotes: 1
Views: 737
Reputation: 29635
If you want to calculate row-wise normaltest
, you should not flatten
your data in x
and use axis=1
such as
df = pd.DataFrame(np.random.random(105).reshape(5,21)) # to generate data
# calculate normaltest row-wise without the first column like you
df['stat'] ,df['p'] = stats.normaltest(df.iloc[:,1:],axis=1)
Then df
contains two columns 'stat' and 'p' with the values your are looking for IIUC.
Note: to be able to perform normaltest
, you need at least 8 values (according to what I experienced) so you need at least 8 columns in df.iloc[:,1:]
otherwise it will raise an error. And even, it would be better to have more than 20 values in each row.
Upvotes: 1