rxmnnxfpvg
rxmnnxfpvg

Reputation: 30993

Add factor column to dataframe based on existing column

Let's say I have a dataframe:

word <- c("good", "great", "bad", "poor", "eh")
userid <- c(1, 2, 3, 4, 5)
d <- data.frame(userid, word)

I want to add a dataframe column, sentiment, that is a factor and depends on what word is:

words_pos <- c("good", "great")
words_neg <- c("bad", "poor")
calculate_sentiment <- function(x) {
     if (x %in% words_pos) {
         return("pos")
     } else if (x %in% words_neg) {
         return("neg")
     }
     return(NA)
}
d$sentiment <- apply(d, 1, function(x) calculate_sentiment(x['word'])

However, now d$sentiment is of type "character". How do I make it a factor with the right levels? pos, neg, NA -- I'm not even sure if NA should be a factor level, as I'm just learning R.

Thanks!

Upvotes: 1

Views: 13313

Answers (2)

Jonathan Carroll
Jonathan Carroll

Reputation: 3947

This isn't going to be the simplest way to do it, but it's a very readable way (in my opinion, preferable to using an abstracted function)... using dplyr's mutate along with case_when:

library(dplyr)
d2 <- mutate(d, sentiment = factor(case_when(word %in% words_pos ~ "pos",
                                             word %in% words_neg ~ "neg",
                                             TRUE                ~ NA_character_)))

glimpse(d2)
#> Observations: 5
#> Variables: 3
#> $ userid    <dbl> 1, 2, 3, 4, 5
#> $ word      <fctr> good, great, bad, poor, eh
#> $ sentiment <fctr> pos, pos, neg, neg, NA

I've spaced it out a bit so it's clearer, but this will:

  • take the data.frame d then
  • mutate (change a column) 'sentiment' to be equal to a factor, defined by
  • a case statement with logicals on the LHS, results on the RHS (NA_character_ required so that everything is the same type).

Output confirms that this is a factor column with the desired values.

Upvotes: 4

PPC
PPC

Reputation: 167

You can add as.factor to the last line of the code. Which will give factors of pos and neg. BTW NA is not a factor.

d$sentiment <-as.factor(apply(d, 1, function(x) calculate_sentiment(x['word'])))

Upvotes: 1

Related Questions