Malte Susen
Malte Susen

Reputation: 845

TypeError: must be str, not list - list of words

For a current project, I am planning to count the occurrence of a number of specific words within a data set.

For the code line count = word.count(wordlist), I am however receiving the following error TypeError: must be str, not list. Is there any smart way to have Python accept a word list, so not only check for one specific but for several words?

The corresponding code looks like this:

# Word frequency analysis
def get_top_n_bigram(corpus, n=None):
    vec = CountVectorizer(ngram_range=(2, 2), stop_words='english').fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis=0)
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
    return words_freq[:n]


# Analysis loops running through different string sections
for i in ['Text_Pro','Text_Con','Text_Main']:
    common_words = get_top_n_bigram(df[i], 500)
    for word, freq in common_words:
        print(word, freq)


# Analysis loops checking if specific words are found
    for word in common_words:
        wordlist = ['good', 'management', 'bad']
        count = word.count(wordlist)
        print(count)

Upvotes: 2

Views: 1654

Answers (3)

Karl Knechtel
Karl Knechtel

Reputation: 61526

Since we need to examine every word regardless, we may as well build the entire histogram of word frequency, and then extract the word counts that we're interested in:

from collections import Counter

def words_matching(sentence, candidates):
    histogram = Counter(sentence.split())
    return sum(histogram[word] for word in candidates)

Upvotes: 1

Gustav Rasmussen
Gustav Rasmussen

Reputation: 3961

Use a list comprehension:

count = 0
for word in common_words:
    wordlist = ['good', 'management', 'bad']
    count += sum([word.count(i) for i in wordlist])

print(count)

As a dictionary, per request from the comment section to this answer:

count = {}
for word in common_words:
    wordlist = ["good", "management", "bad"]
    count[word] = sum([word.count(i) for i in wordlist])

print(count)

Upvotes: 1

Carcigenicate
Carcigenicate

Reputation: 45750

I'd go the more efficient, but more verbose way of checking against a set in an old-fashioned loop:

from typing import Iterable

def count_many(string: str, words: Iterable[str]) -> int:
    search_set = set(words)  # To ease lookups
    split = string.split()  # Cut into words

    count = 0
    for word in split:
        if word in search_set:
            count += 1

    return count

>>> count_many("hello world hello no world hello", ["hello", "world"])
5

Put the words to lookup in a set for faster lookups, split the source text into words, then just loop and count.

This should do, regardless of the length of words, two iterations of the source text.

Upvotes: 1

Related Questions