Problem with different column lengths in Pandas Dataframe

Question

I know it's probably obvious how to solve it, but I am out of ideas...

I import a .csv file with Pandas into a dataframe. The data has the format: 3 columns with single headers, 1st column: 45 rows, 2nd column 40 rows, 3rd column: 21 rows. The shape is then (45,3). The "missing" rows are filled with NANs and here starts my problem.

I want to evaluate some statistics data with different scipy functions like the Anderson Darling test etc., like this:

for i in columns:
print ([i])
a = stats.anderson(df[i], dist = 'norm')
print (a)
if a[0] > a[1][2]:
    print('The null hypothesis can be rejected at', a[2][2],'% significance level')
else:
    print('The null hypothesis cannot be rejected')

So, the first column gets evaluated just fine:

['Z79V0001']AndersonResult(statistic=0.41768739435435975, critical_values=array([0.535, 0.609, 0.731, 0.853, 1.014]), significance_level=array([15. , 10. ,  5. ,  2.5,  1. ]))The null hypothesis cannot be rejected

but for the others I get something like

['Z79V0003_1']AndersonResult(statistic=nan, critical_values=array([0.535, 0.609, 0.731, 0.853, 1.014]), significance_level=array([15. , 10. ,  5. ,  2.5,  1. ]))

The null hypothesis cannot be rejected Filling the NAN values with zeros does not help because then the statistics will be calculated the wrong way. I simply cannot get around how to adjust the lengths of the columns so that the functions just works on the rows where it finds numbers and if gets to NAN goes on with next column... Help would be very much appreciated.

Problem with different column lengths in Pandas Dataframe

Answers (1)

Related Questions