How do I calculate number of words and number of unique words contained within a list of a column across all rows of my dataframe?

Question

I generated a column df['adjectives'] in my pandas dataframe that has a list of all the adjectives from another column, df['reviews'].

The values of df['adjectives'] are in this format, for example:

['excellent', 'better', 'big', 'unexpected', 'excellent', 'big']

I would like to create a new column that counts the total number of words in df['adjectives'] as well as the number of 'unique' words in df['adjectives'].

The function should iterate across the entire dataframe and apply the counts for each row.

For the above row example, I would want df['totaladj'] to be 6 and df['uniqueadj'] to be 4 (since 'excellent' and 'big' are repeated)

import pandas as pd

df=pd.read_csv('./data.csv')

df['totaladj'] = df['adjectives'].str.count(' ') + 1

df.to_csv('./data.csv', index=False)

The above code works when counting the total number of adjectives, but not the unique number of adjectives.

Puwx · Accepted Answer

Is this the type of behavior that you are looking for?

Based off of your description I assumed that the values in the adjectives column are a string formatted like a list e.g. "['big','excellent','small']"

The code below converts the strings to a list using split(), and then gets the length using len().Finding the number of unique adjectives is done by converting the list to a set before using len().

df['adjcount'] = df['adjectives'].apply(lambda x:  len(x[1:-1].split(',')))

df['uniqueadjcount'] =  df['adjectives'].apply(lambda x:  len(set(x[1:-1].split(','))))

How do I calculate number of words and number of unique words contained within a list of a column across all rows of my dataframe?

Answers (1)

Related Questions