user9287749
user9287749

Reputation: 53

How to create 2D array in python

I'm trying to create a function called "words_in_texts" to get the result like this

words_in_texts(['hello', 'bye', 'world'], 
               pd.Series(['hello', 'hello world hello'])

array([[1, 0, 0],
   [1, 0, 1]])   

I believe that the argument for this function should be a list with all the words and a series.

def words_in_texts(words, texts):
'''
Args:
    words (list-like): words to find
    texts (Series): strings to search in

Returns:
    NumPy array of 0s and 1s with shape (n, p) where n is the
    number of texts and p is the number of words.
'''
indicator_array = texts.str.contains(words)

return indicator_array

I'm confused on how to create the 2d array result, can anyone please help me with this? Thank you in advance!

Upvotes: 1

Views: 230

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

Use sklearn.feature_extraction.text.CountVectorizer:

In [52]: from sklearn.feature_extraction.text import CountVectorizer

In [53]: vect = CountVectorizer(vocabulary=['hello', 'bye', 'world'], binary=True)

In [54]: X = vect.fit_transform(pd.Series(['hello', 'hello world hello']))

result as a sparse matrix:

In [55]: X
Out[55]:
<2x3 sparse matrix of type '<class 'numpy.int64'>'
        with 3 stored elements in Compressed Sparse Row format>

you can convert it to dense matrix:

In [56]: X.A
Out[56]:
array([[1, 0, 0],
       [1, 0, 1]], dtype=int64)

features (column names):

In [57]: vect.get_feature_names()
Out[57]: ['hello', 'bye', 'world']

Upvotes: 2

Related Questions