Two sample equality: Can I save distribution statistic and load them to compare against new data

Question

I want to perform some statistical comparisons between train and test sets, more specifically to compare the similarity of the distributions between features. Lets suppose we do this using the two-sample Kolmogorov-sminov test. But the way I want to perform such an analysis is to first calculate the part of the statistic on the train data, save it to disk and then only call this when the new data comes in to use it with the test data. So I dont want to load the entire train data frame to calculate the two-sample distribution similarity test. Is that possible somehow? If not with KS test, maybe some other, say kullback leibler divergence. Thanks.

Two sample equality: Can I save distribution statistic and load them to compare against new data

Answers (1)

Related Questions