Zelphir Kaltstahl
Zelphir Kaltstahl

Reputation: 6189

How to sort scipy.sparse.csr.csr_matrix according to result of another function?

I am learning about machine learning, more precisely about logistic regression / classification. In my code I have a <class 'scipy.sparse.csr.csr_matrix'> object. I need to sort this sparse matrix, or the SFrame from which it was generated (using ...) according to the result of LogisticRegression.predict_proba, to be precise the second column the arrays, which are contained in the resulting array of predict_proba.

How I generated the sparse matrix:

from sklearn.feature_extraction.text import CountVectorizer

products = sframe.SFrame('...')

train_data, test_data = products.random_split(.8, seed=1)

vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
test_matrix = vectorizer.transform(test_data['review_clean'])

How I calculate the probabilities:

sentiment_model.predict_proba(test_matrix)

(where sentiment_model is a learned classifier, using logistic regression) This gives me a <class 'numpy.ndarray'>, which looks like this:

[[  4.65761066e-03   9.95342389e-01]
 [  9.75851270e-01   2.41487300e-02]
 [  9.99983374e-01   1.66258341e-05]]

Here is an example for what the SFrame data looks like, if I print it using the print function:

+-------------------------------+-------------------------------+--------+
|              name             |             review            | rating |
+-------------------------------+-------------------------------+--------+
|   Our Baby Girl Memory Book   | Absolutely love it and all... |  5.0   |
| Wall Decor Removable Decal... | Would not purchase again o... |  2.0   |
| New Style Trailing Cherry ... | Was so excited to get this... |  1.0   |
+-------------------------------+-------------------------------+--------+
+-------------------------------+-----------+
|          review_clean         | sentiment |
+-------------------------------+-----------+
| Absolutely love it and all... |     1     |
| Would not purchase again o... |     -1    |
| Was so excited to get this... |     -1    |
+-------------------------------+-----------+

So I'd need some function, which can sort that matrix, depending on the results of the predict_proba function.

Question: How can I sort it like that?

What I already tried

sorted(test_matrix)

results in:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

sorted(test_matrix_complete, key=lambda x: sentiment_model.predict_proba(x))

also results in:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The main problem is, that I don't know how to make the connection between the SFrame's data and the sparse matrix efficiently, I guess.

Upvotes: 1

Views: 4043

Answers (1)

Forzaa
Forzaa

Reputation: 1545

You can simply index by the sorted indices of result.

sorted_matrix = test_matrix[np.argsort(result)]

Upvotes: 3

Related Questions