Reputation: 24555
I am trying to use sklearn.neural_network.BernoulliRBM with iris dataset:
from sklearn import datasets
iris = datasets.load_iris()
collist = ['SL', 'SW', 'PL', 'PW']
dat = pd.DataFrame(data=iris.data, columns=collist)
from sklearn.neural_network import BernoulliRBM
model = BernoulliRBM(n_components=2)
scores = model.fit_transform(dat)
print(scores.shape)
print(scores)
However, I am only getting 1 as output for all rows:
(150, 2)
[[1. 1.]
[1. 1.]
[1. 1.]
[1. 1.]
[1. 1.] # same for all rows
Can I get values similar to scores for individual rows as I can get in principal component analysis? Else how can I get some useful numbers from RBM? I tried model.score_samples(dat)
but that also gives value of 0
for vast majority of rows.
Upvotes: 1
Views: 1049
Reputation: 3286
According to the documentation:
The model makes assumptions regarding the distribution of inputs. At the moment, scikit-learn only provides BernoulliRBM, which assumes the inputs are either binary values or values between 0 and 1, each encoding the probability that the specific feature would be turned on.
Since your dat
values are all greater than 1, I'm guessing the model is truncating all input data to 1.0. If, for example, you apply a normalization:
from sklearn.preprocessing import normalize
scores = model.fit_transform(normalize(dat))
You'll get values with some variation:
array([[0.23041219, 0.23019722],
[0.23046652, 0.23025144],
...,
[0.23159369, 0.23137678],
[0.2316786 , 0.23146158]])
Since your input features must have an interpretation as probabilities, you'll want to think about what if any normalization is reasonable for the particular problem you are solving.
Upvotes: 1