Reputation: 147
I am just learning how to use the statsmodels (Python module) to perform such things as regression analysis, testing for normality and homogenous variance. I am working with several data sets (usually as CSV files) and would like to write a script to help me do this more efficiently. My data is just a set of numbers.
Example of data
column1, column2
2.80609,2.80609
2.39059,1.6697286666666666
3.6487540000000003,1.8243770000000001
1.8582885714285717,3.0046419047619044
2.587834,1.7252226666666666
...
Specifically, I would like to:
(1) Iterate a test over several files
(2) Save the result of each test to a new row in a new file
An example of one of the tests I am using.
data = pd.read_csv("data.csv")
column1 = data['column1']
p_value = sm.stats.diagnostic.lilliefors(data, dist='norm', pvalmethod='approx')
The float it exports look like this.
(0.08557045418097009, 7.144631930303909e-50)
My adventures into if/else and boolean values resulted in this code that prints a text that would be nifty to have exported as well.
p_value = sm.stats.diagnostic.lilliefors(curvature_length, dist='norm', pvalmethod='approx')[1]
if p_value<0.05:
print("Data is not normal distributed")
else:
print("Data is normal distributed")
print(p_value)
Any tips & feedback on how to go about this would be greatly appreciated!
Upvotes: 2
Views: 445
Reputation: 1483
To write the two floats to CSV https://docs.python.org/3/library/csv.html
import csv
with open('eggs.csv', 'w', newline='') as csvfile:
csvwriter = csv.writer(csvfile)
cycle through the pairs of floats doing
csvwriter.writerow([float1, float2])
Upvotes: 2