Reputation: 101
I generated a column df['adjectives']
in my pandas dataframe that has a list of all the adjectives from another column, df['reviews']
.
The values of df['adjectives']
are in this format, for example:
['excellent', 'better', 'big', 'unexpected', 'excellent', 'big']
I would like to create a new column that counts the total number of words in df['adjectives']
as well as the number of 'unique' words in df['adjectives']
.
The function should iterate across the entire dataframe and apply the counts for each row.
For the above row example, I would want df['totaladj']
to be 6 and df['uniqueadj']
to be 4 (since 'excellent' and 'big' are repeated)
import pandas as pd
df=pd.read_csv('./data.csv')
df['totaladj'] = df['adjectives'].str.count(' ') + 1
df.to_csv('./data.csv', index=False)
The above code works when counting the total number of adjectives, but not the unique number of adjectives.
Upvotes: 0
Views: 222
Reputation: 26
Is this the type of behavior that you are looking for?
Based off of your description I assumed that the values in the adjectives column are a string formatted like a list e.g. "['big','excellent','small']"
The code below converts the strings to a list using split(), and then gets the length using len().Finding the number of unique adjectives is done by converting the list to a set before using len().
df['adjcount'] = df['adjectives'].apply(lambda x: len(x[1:-1].split(',')))
df['uniqueadjcount'] = df['adjectives'].apply(lambda x: len(set(x[1:-1].split(','))))
Upvotes: 1