muninn
muninn

Reputation: 483

How do I count the total number of words in a Pandas dataframe cell and add those to a new column?

A common task in sentiment analysis is to obtain the count of words within a Pandas data frame cell and create a new column based on that count. How do I do this?

Upvotes: 8

Views: 14806

Answers (4)

altabq
altabq

Reputation: 1424

Assuming that a sentence with n words has n-1 spaces in it, there's another solution:

df['new_column'] = df['count_column'].str.count(' ') + 1

This solution is probably faster, because it does not split each string into a list.

If count_column contains empty strings, the result needs to be adjusted (see comment below):

df['new_column'] = np.where(df['count_column'] == '', 0, df['new_column'])

Upvotes: 10

Isurie
Isurie

Reputation: 320

For dataframe df remove punctuations from the selected column:

string_text = df['reviews'].str
df['reviews'] = string_text.translate(str.maketrans('', '', string.punctuation))

Get the word count:

df['review_word_count'] = df['reviews'].apply(word_tokenize).tolist()
df['review_word_count'] = df['review_word_count'].apply(len)

Write to a CSV with new column:

df.to_csv('./data/dataset.csv')

Upvotes: 2

A.Kot
A.Kot

Reputation: 7903

from collections import Counter

df['new_column'] = df['count_column'].apply(lambda x: Counter(" ".join(x).split(" ")).items())

Upvotes: 0

muninn
muninn

Reputation: 483

Let's say you have a dataframe df that you've generated using

df = pandas.read_csv('dataset.csv')

You would then add a new column with the word count by doing the following:

df['new_column'] = df.columnToCount.apply(lambda x: len(str(x).split(' ')))

Keep in mind the space in the split is important since you're splitting on new words. You may want to remove punctuation or numbers and reduce to lowercase before performing this as well.

df = df.apply(lambda x: x.astype(str).str.lower())
df = df.replace('\d+', '', regex = True)
df = df.replace('[^\w\s\+]', '', regex = True)

Upvotes: 6

Related Questions