How to keep a dictionary of dictionaries (or something with similar functionality) in pandas?

Question

So I have a large dataframe with many columns. Let us say the two main columns I am interested are messages and names. Each message will be something like a personal status and will be accompanied by the person's name. Let's say I have a word bank of emotion/feeling words which is fairly huge but a condensed version could look like ['sad', 'happy', 'relieved', 'annoyed', 'angry', 'ecstatic', 'proud', 'disappointed']. For each name (and names could be repeated since they could have multiple messages), I want to keep track of the emotion words they have used and their accompanying counts. For example (I have made this up since I have a really large dataset):

Message                                      Name
I am really happy with my progress.          Alice
I was annoyed by his inconsideration.        John
I felt proud after seeing her performance.   Lisa
I am ecstatic after hearing the good news.   Alice
I felt disappointed by her dishonesty        Lisa

If I wanted something that would essentially tell me the following (I don't necessarily care about the format as long as it's clean): {Alice: happy:1, ecstatic: 1; John: annoyed:1; Lisa: proud:1, disappointed: 1}.

What is the best/easiest way of doing this? Is a dictionary of dictionaries the best way to do this? Secondly (this is sort of an additional related question), after computing above, what if I wanted to find the most frequently used emotion word for each name. In this example everything is pretty much tied at 1, but imagine the counts were not the same.

Jan Trienes · Accepted Answer

If you are looking for a very easy way of accomplishing what you are looking for, I would suggest using the pandas groupby functionality in combination with the collections utility.

Initial Situation

df = pd.read_csv('messages.csv')
df['Emotion Words'] = df.Message.apply(extract_emotion_words)
print(df)

Output:

                                      Message   Name   Emotion Words
0         I am really happy with my progress.  Alice         [happy]
1      I am really happy with johns progress.  Alice         [happy]
2       I was annoyed by his inconsideration.   John       [annoyed]
3  I felt proud after seeing her performance.   Lisa         [proud]
4  I am ecstatic after hearing the good news.  Alice      [ecstatic]
5       I felt disappointed by her dishonesty   Lisa  [disappointed]

Group By Name and Apply Counter

from collections import Counter
df.groupby('Name')['Emotion Words'].sum().apply(Counter)

Output:

Name
Alice        {'happy': 2, 'ecstatic': 1}
John                      {'annoyed': 1}
Lisa     {'disappointed': 1, 'proud': 1}
dtype: object

Get Most Frequent Word

Since you also indicated that you want to get the most common emotion word, you can use the Counter.most_common(n) function where n is the number of most common words to extract (in your case 1):

df.groupby('Name')['Emotion Words'].sum().apply(
   lambda words: Counter(words).most_common(1)
)

Output:

Name
Alice           [(happy, 2)]
John          [(annoyed, 1)]
Lisa     [(disappointed, 1)]
dtype: object

For this little example, I used the following function to extract the emotion words. There sure might be better ways of doing it (perhaps you are already using the nltk, if not I would suggest you look into it).

import re

emotionWords = set(['sad', 'happy', 'relieved', 'annoyed', 'angry', 'ecstatic', 'proud', 'disappointed'])

def extract_emotion_words(message):
    tokenized = re.split(r'\W+', message.lower())
    return list(set(tokenized) & emotionWords)

How to keep a dictionary of dictionaries (or something with similar functionality) in pandas?

Answers (1)

Initial Situation

Group By Name and Apply Counter

Get Most Frequent Word

Related Questions