Reputation: 95
With the following sample data, I'm trying to create a new column "NOTA_NUM" (value 0 or 1 or 2 or 3 or 4) in my dataframe(df) based on the values of four conditional variables ("A", "B", "C", "D", "E") from one existing column (column1).
I have already tried:
df$NOTA_NUM <- ifelse(rowSums(df[ , "column1"]=="A"), 0,
ifelse(rowSums(df[ , "column1"]=="B"), 1,
ifelse(rowSums(df[ ,"column1"]=="C"), 2,
ifelse(rowSums(df[ , "column1"]=="D",3,4))
but that didn't work the way I would like.
I want "NOTA_NUM" to look like:
column1 NOTA_NUM
A 0
C 2
B 1
D 3
E 4
Upvotes: 0
Views: 1045
Reputation: 1054
I like dplyr::case_when
for these situations:
library(dplyr)
df <- data.frame(column1 = c("A", "C", "B", "D", "E")) %>%
mutate(NOTA_NUM = case_when(column1 == "A" ~ 0,
column1 == "B" ~ 1,
column1 == "C" ~ 2,
column1 == "D" ~ 3,
TRUE ~ 4))
Upvotes: 1
Reputation: 270438
Here are some approaches. No packages are used.
1) match Using DF
shown reproducibly in the Note at the end match each element in column1
to LETTERS[1:4]
and use 5 if no match. Subtract 1 from that.
transform(DF, NOIA_NUM = match(column1, LETTERS[1:4], nomatch = 5) - 1)
giving:
column1 NOIA_NUM
1 A 0
2 C 2
3 B 1
4 D 3
5 E 4
2) switch Another possibility is to use switch
:
transform(DF, NOTA_NUM = sapply(column1, switch, A = 0, B = 1, C = 2, D = 3, 4))
3) arithmetic This uses an arithmetic expression which evaluates to the required values:
transform(DF, NOTA_NUM = (0-4) * (column1 == "A") +
(1-4) * (column1 == "B") +
(2-4) * (column1 == "C") +
(3-4) * (column1 == "D") +
4)
DF <- data.frame(column1 = c("A", "C", "B", "D", "E"), stringsAsFactors = FALSE)
Upvotes: 4
Reputation: 33822
Not sure that I'd recommend as.numeric(factor(...))
as a general solution, but works for your case:
library(dplyr)
set.seed(1001) # for reproducible sample
# column1 = factor as stringsAsFactors = TRUE (default)
data.frame(column1 = sample(LETTERS[1:5], 50, replace = TRUE)) %>%
mutate(NOTA_NUM = as.numeric(column1)-1)
Upvotes: 0
Reputation: 263499
I would avoid ifelse
for this purpose. You should employ a more efficient and compact approach to a table lookup or conversion. Try using a named vector as the table and pass the inputs to the "[" function:
> lookup = c(A=0, C= 2, B = 1, D= 3, E = 4)
> df <- data.frame( cl1 = names(lookup))
> df
cl1
1 A
2 C
3 B
4 D
5 E
> df$NOTA_NUM= lookup[df$cl1]
> df
cl1 NOTA_NUM
1 A 0
2 C 1
3 B 2
4 D 3
5 E 4
If you need these to be letters then quote them in the lookup vector but beware that the data.frame function will make them factors unless you explicitly prevent that default action. See ?data.frame
for the proper use of stringsAsFactors
parameter
Upvotes: 0