Empty vocabulary for single letter by CountVectorizer

Question

Trying to convert string into numeric vector,

### Clean the string
def names_to_words(names):
    print('a')
    words = re.sub("[^a-zA-Z]"," ",names).lower().split()
    print('b')

    return words


### Vectorization
def Vectorizer():
    Vectorizer= CountVectorizer(
                analyzer = "word",  
                tokenizer = None,  
                preprocessor = None, 
                stop_words = None,  
                max_features = 5000)
    return Vectorizer  


### Test a string
s = 'abc...'
r = names_to_words(s)
feature = Vectorizer().fit_transform(r).toarray()

But when I encoutered:

 ['g', 'o', 'm', 'd']

There's error:

ValueError: empty vocabulary; perhaps the documents only contain stop words

It seems there's a problem with such single-letter string. what should I do？ Thx

Empty vocabulary for single letter by CountVectorizer

Answers (1)

Related Questions