Reputation: 483
A common task in sentiment analysis is to obtain the count of words within a Pandas data frame cell and create a new column based on that count. How do I do this?
Upvotes: 8
Views: 14806
Reputation: 1424
Assuming that a sentence with n words has n-1 spaces in it, there's another solution:
df['new_column'] = df['count_column'].str.count(' ') + 1
This solution is probably faster, because it does not split each string into a list.
If count_column
contains empty strings, the result needs to be adjusted (see comment below):
df['new_column'] = np.where(df['count_column'] == '', 0, df['new_column'])
Upvotes: 10
Reputation: 320
For dataframe df remove punctuations from the selected column:
string_text = df['reviews'].str
df['reviews'] = string_text.translate(str.maketrans('', '', string.punctuation))
Get the word count:
df['review_word_count'] = df['reviews'].apply(word_tokenize).tolist()
df['review_word_count'] = df['review_word_count'].apply(len)
Write to a CSV with new column:
df.to_csv('./data/dataset.csv')
Upvotes: 2
Reputation: 7903
from collections import Counter
df['new_column'] = df['count_column'].apply(lambda x: Counter(" ".join(x).split(" ")).items())
Upvotes: 0
Reputation: 483
Let's say you have a dataframe df that you've generated using
df = pandas.read_csv('dataset.csv')
You would then add a new column with the word count by doing the following:
df['new_column'] = df.columnToCount.apply(lambda x: len(str(x).split(' ')))
Keep in mind the space in the split is important since you're splitting on new words. You may want to remove punctuation or numbers and reduce to lowercase before performing this as well.
df = df.apply(lambda x: x.astype(str).str.lower())
df = df.replace('\d+', '', regex = True)
df = df.replace('[^\w\s\+]', '', regex = True)
Upvotes: 6