Alan
Alan

Reputation: 17

Want to use where clause like SQL in Python

I have a corpus of text that needs to be analysed. I have a data frame with the below headers.

print((df.columns.values))
>>>> ['Unique ID' 'Quarter' 'Theme' 'Subtheme' 'Driver' 'Ticker' 'Company'
'Sub-sector' 'Issue weight' 'Quote' 'Executive name' 'Designation'
'Quote_len' 'word_count']

I have written a function to find Top 20 words in the 'Quote' column after removing stop words.

def get_top_n_words(corpus, n=None):
    vec = CountVectorizer(stop_words = 'english').fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis=0) 
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
    return words_freq[:n]
common_words = get_top_n_words(df['Quote'].values.astype('U'), 20)
for word, freq in common_words:
    print(word, freq)
df2 = pd.DataFrame(common_words, columns = ['ReviewText' , 'count'])
df2.groupby('ReviewText').sum()['count'].sort_values(ascending=False).iplot(
    kind='bar', yTitle='Count', linecolor='black', title='Top 20 words in review after removing stop words')

Now is wish to use a where clause within the code to find results for the column "Theme".

For eg. Theme= 'Competitive advantage'

How to do that?

Upvotes: 0

Views: 53

Answers (1)

joejoemac
joejoemac

Reputation: 165

Use DataFrame.loc[...] to filter down your results.

For example df = df.loc[df.Theme == 'Competitive advantage'].

Then continue with common_words = get_top_n_words(df['Quote'].values.astype('U'), 20), but now the dataframe will only include results where Theme == 'Competitive advantage'.

Upvotes: 1

Related Questions