How to use my own lexicon dictionary to analyse sentences in R?

Question

I have formed a new lexicon dictionary to analyse the sentiment of sentences in R. I have used lexicon dictionaries before using R, but I unsure how to use my own. I managed to create positive and negative list of words, which counts the number of positive and negative words, then providing a sum. This does not take into account the scores allocated to each word as shown in the example below.

I would like to analyse say this sentence "I am happy and kind of sad". Example list of words and scores (list would be bigger than this):

happy, 1.3455
sad, -1.0552

I would like to match these words with the sentence and take the sum of the scores, 1.3455 + -1.0552, which in this case gives an overall score of 0.2903.

How would I go about in taking the actual score for each word to provide an overall score when analysing the sentiment of each sentence in R as emphasised in the example above?

Many thanks, James

s__ · Accepted Answer

You can start with the magnificent tidytext package:

library(tidytext)
library(tidyverse)

First, your data to analyze, and a small transformation:

# data
df <-data_frame(text = c('I am happy and kind of sad','sad is sad, happy is good'))

# add and ID
df <- tibble::rowid_to_column(df, "ID")

# add the name of the ID column
colnames(df)[1] <- "line"

> df
# A tibble: 1 x 2
   line text                      
                        
1     1 I am happy and kind of sad

Then you could work them to make words in column. This is a "loop" that is applied to each sentence (each id):

 tidy <- df %>% unnest_tokens(word, text)
    > tidy
# A tibble: 7 x 2
   line word 
   
1     1 i    
2     1 am   
3     1 happy
4     1 and  
5     1 kind 
6     1 of   
7     1 sad

Now your brand new lexicon:

lexicon <- data_frame(word =c('happy','sad'),scores=c(1.3455,-1.0552))
> lexicon
# A tibble: 2 x 2
  word  scores
    
1 happy   1.35
2 sad    -1.06

Lastly, you can merge lexicon and data to have the sum of the scores.

merged <- merge(tidy,lexicon, by = 'word')

Now for each phrase, the sentiment:

scoredf <- aggregate(cbind(scores) ~line, data = merged, sum)
>scoredf
  line  scores
1    1  0.2903
2    2 -0.7649

Lastly you can merge the initial df with the scores, to have phrases and scores together:

scoredf <- aggregate(cbind(scores) ~line, data = merged, sum)
merge(df,scoredf, by ='line')
  line                       text  scores
1    1 I am happy and kind of sad  0.2903
2    2  sad is sad, happy is good -0.7649

In case you want for multiple phrases the overall sentiment scores.
Hope it helps!

How to use my own lexicon dictionary to analyse sentences in R?

Answers (1)

Related Questions