Sandy
Sandy

Reputation: 1148

New column / mutate based on existing column

I want to add a new column to a dataframe df based on a condition from the existing columns e.g.,

df$TScore = as.factor(0)
df$TScore = 
  if_else(df$test_score >= '8.0', 'high',
      if_else(!is.na(df$test_score), 'low', 'NA'))

The problem I am facing is, for some cases TScore is what I would expect it to be i.e., 'high' when the score is 8 or greater but for some cases it is not correct. Is there an error in the above code? There are lots of NAs in this data.

I am also struggling with how to write it using dplyr(). So far, I have written this:

df$TScore =   df %>%
                filter(test_score >= 8) %>%
                    mutate(TScore = 'high')

But as we would expect, the dimensions do not match. Following error is given:

Error in `$<-.data.frame`(`*tmp*`, appScore, value = list(cluster3 = c(1L,  : replacement has 126 rows, data has 236

Any advice would be greatly appreciated.

Upvotes: 1

Views: 260

Answers (1)

akrun
akrun

Reputation: 887741

We don't need to do the filter, insted can use ifelse or case_when

library(dplyr)
df <- df %>%           
          mutate(TScore = case_when(test_score >= 8 ~'high', TRUE ~ "low"))

if we need to avoid the assignment <, can use the compound operator (%<>% from magrittr

library(magrittr)
df %<>%
     mutate(TScore = case_when(is.na(test_score) ~ NA_character_, 
                               test_score >= 8 & !is.na(test_score) ~'high', 
                    TRUE ~ "low"))

The error occurred because of assigning a filtered data.frame to a new column in the original dataset

Upvotes: 1

Related Questions