Compare words and return Pandas DataFrame entry

Question

I am planning to set up a simple function to see if words from a wordlist can be found in a Pandas DataFrame common_words. In case of a match, I would like to return the corresponding DataFrame entry, while the DF has the format life balance 14, long term 9, upper management 9, highlighting the word token and its occurrence number.

The code below is however currently only printing the search term from the wordlist (i.e. life balance), not the DataFrame entry that includes the occurrence count. I would hence need to find a way to return word instead of the wordlist element. Where is my error in reasoning?

The relevant code section is:

    # Check for matches between wordlist and Pandas dataframe
    def wordcheck():
        wordlist = ["work balance", "good management", "work life"]
        for x in wordlist:
            if df[i].str.contains(x).any():
                print('Group 1:', x)
    wordcheck()

The full code segment looks as follows:

# Loading and normalising the input file
file = open("glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)


# Datetime conversion
df['Date'] = pd.to_datetime(df['Date'])
# Adding of 'Quarter' column
df['Quarter'] = df['Date'].dt.to_period('Q')


# Word frequency analysis
def get_top_n_bigram(corpus, n=None):
    vec = CountVectorizer(ngram_range=(2, 2), stop_words='english').fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis=0)
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
    return words_freq[:n]


# Analysis loops through different qualitative sections
for i in ['Text_Pro','Text_Con','Text_Main']:
    common_words = get_top_n_bigram(df[i], 500)
    for word, freq in common_words:
        print(word, freq)


    # Check for matches between wordlist and Pandas dataframe
    def wordcheck():
        wordlist = ["work balance", "good management", "work life"]
        for x in wordlist:
            if df[i].str.contains(x).any():
                print('Group 1:', x)
    wordcheck()

Compare words and return Pandas DataFrame entry

Answers (1)

Related Questions