Reputation: 189
How can I pickle or save a scipy kde for later use?
import scipy.stats as scs
from sklearn.externals import joblib
kde = scs.gaussian_kde(data, bw_method=.15)
joblib.dump(kde, 'test.pkl')
I tried above and received this error:
PicklingError: Can't pickle <function gaussian_kde.set_bandwidth.<locals>.<lambda> at 0x1a5b6fb7b8>: it's not found as scipy.stats.kde.gaussian_kde.set_bandwidth.<locals>.<lambda>
Upvotes: 3
Views: 1749
Reputation: 8207
Looks like joblib is having trouble with the set_bandwith
method, my guess is because of the lambda
function in the method -- pickling lambdas has been discussed here.
with open('test.pkl', 'wb') as fo:
joblib.dump(lambda x,y: x+y, fo)
PicklingError: Can't pickle <function <lambda> at 0x7ff89495d598>: it's not found as __main__.<lambda>
cloudpickle and dill both work as far as I can tell:
import cloudpickle
import dill
with open('test.cp.pkl', 'wb') as f:
cloudpickle.dump(kde, f)
with open('test.dill.pkl', 'wb') as f:
dill.dump(kde, f)
with open('test.cp.pkl', 'rb') as f:
kde_cp = cloudpickle.load(f)
with open('test.dill.pkl', 'rb') as f:
kde_dill = dill.load(f)
Inspect some of the data:
import numpy as np
print(np.array_equal(kde.dataset, kde_cp.dataset))
True
print(np.array_equal(kde.dataset, kde_dill.dataset))
True
print(np.array_equal(kde_cp.dataset, kde_dill.dataset))
True
kde.pdf(10) == kde_cp.pdf(10) == kde_dill.pdf(10)
array([ True])
Upvotes: 5