kimia ghorbani
kimia ghorbani

Reputation: 1

I ran a VADER sentiment analysis on mulitple files and the compound score for all of them was 1; how can I validate this result?

I have transcript files from 6 different interviews, and am running a sentiment analysis on the text using VADER. The compound score for all the files was 1. This does not seem correct to me, but I'm not sure why this happened, or how to trouble shoot.

The code I have is:

  for (i in MD_scripts) {
    file_MD <- read_file(i)
    gsub("[\r\n]", "", file_MD)
    vader_MD <- get_vader(file_MD)
    df_vader <- data.frame(rbind(df_vader, vader_MD))
   } 

The pos, neu, neg scores are also eerily similar, but not exactly the same. Any tips/ideas?

I thought of running VADER on individual sentences (successful in doing this) and trying to calculate the overall compound score by hand, but I could not figure out how to do that.

Upvotes: 0

Views: 166

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76651

Here are two ways of correcting the code in the question.

One, use a for loop. You will have to create a results list vader_list beforehand.

library(vader)

vader_list <- vector("list", length = length(MD_scripts))
for (i in seq_along(MD_scripts)) {
  file_MD <- MD_scripts[[i]] |>
    readLines() |> 
    paste(collapse = " ")
  vader_list[[i]] <- get_vader(file_MD)
} 

You can also use a lapply loop, which makes the code simpler.

library(vader)

vader_list <- lapply(MD_scripts, \(fl) {
  fl |>
    readLines() |> 
    paste(collapse = " ") |>
    get_vader()
})

Upvotes: 0

Related Questions