Reputation: 1148
I want to add a new column to a dataframe df based on a condition from the existing columns e.g.,
df$TScore = as.factor(0)
df$TScore =
if_else(df$test_score >= '8.0', 'high',
if_else(!is.na(df$test_score), 'low', 'NA'))
The problem I am facing is, for some cases TScore is what I would expect it to be i.e., 'high' when the score is 8 or greater but for some cases it is not correct. Is there an error in the above code? There are lots of NAs in this data.
I am also struggling with how to write it using dplyr(). So far, I have written this:
df$TScore = df %>%
filter(test_score >= 8) %>%
mutate(TScore = 'high')
But as we would expect, the dimensions do not match. Following error is given:
Error in `$<-.data.frame`(`*tmp*`, appScore, value = list(cluster3 = c(1L, : replacement has 126 rows, data has 236
Any advice would be greatly appreciated.
Upvotes: 1
Views: 260
Reputation: 887741
We don't need to do the filter
, insted can use ifelse
or case_when
library(dplyr)
df <- df %>%
mutate(TScore = case_when(test_score >= 8 ~'high', TRUE ~ "low"))
if we need to avoid the assignment <
, can use the compound operator (%<>%
from magrittr
library(magrittr)
df %<>%
mutate(TScore = case_when(is.na(test_score) ~ NA_character_,
test_score >= 8 & !is.na(test_score) ~'high',
TRUE ~ "low"))
The error occurred because of assigning a filtered
data.frame to a new column in the original dataset
Upvotes: 1