Reputation: 6189
I am learning about machine learning, more precisely about logistic regression / classification. In my code I have a <class 'scipy.sparse.csr.csr_matrix'>
object. I need to sort this sparse matrix, or the SFrame
from which it was generated (using ...) according to the result of LogisticRegression.predict_proba
, to be precise the second column the arrays, which are contained in the resulting array of predict_proba
.
How I generated the sparse matrix:
from sklearn.feature_extraction.text import CountVectorizer
products = sframe.SFrame('...')
train_data, test_data = products.random_split(.8, seed=1)
vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
test_matrix = vectorizer.transform(test_data['review_clean'])
How I calculate the probabilities:
sentiment_model.predict_proba(test_matrix)
(where sentiment_model is a learned classifier, using logistic regression)
This gives me a <class 'numpy.ndarray'>
, which looks like this:
[[ 4.65761066e-03 9.95342389e-01]
[ 9.75851270e-01 2.41487300e-02]
[ 9.99983374e-01 1.66258341e-05]]
Here is an example for what the SFrame data looks like, if I print it using the print
function:
+-------------------------------+-------------------------------+--------+
| name | review | rating |
+-------------------------------+-------------------------------+--------+
| Our Baby Girl Memory Book | Absolutely love it and all... | 5.0 |
| Wall Decor Removable Decal... | Would not purchase again o... | 2.0 |
| New Style Trailing Cherry ... | Was so excited to get this... | 1.0 |
+-------------------------------+-------------------------------+--------+
+-------------------------------+-----------+
| review_clean | sentiment |
+-------------------------------+-----------+
| Absolutely love it and all... | 1 |
| Would not purchase again o... | -1 |
| Was so excited to get this... | -1 |
+-------------------------------+-----------+
So I'd need some function, which can sort that matrix, depending on the results of the predict_proba function.
Question: How can I sort it like that?
sorted(test_matrix)
results in:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
sorted(test_matrix_complete, key=lambda x: sentiment_model.predict_proba(x))
also results in:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
The main problem is, that I don't know how to make the connection between the SFrame's data and the sparse matrix efficiently, I guess.
Upvotes: 1
Views: 4043
Reputation: 1545
You can simply index by the sorted indices of result
.
sorted_matrix = test_matrix[np.argsort(result)]
Upvotes: 3