Heliornis
Heliornis

Reputation: 391

Assign new values to levels in R

all,

I have a large data set (over 2 million rows), and in one of the columns I have the following levels:

"0"     "0.001" "1"     "4"     "4.001" "8.001"

I want to make a new column where each of those has a new, corresponding letter:

0 = x, 0.001 = D, 1 = C, 4 and 4.001 = B, and 8.001 = A

Is there a way to do this without using a for loops with 6 if statements? I tried that, and it was taking forever to run.

Here's a test sample:

      a b
1 0.000 x
2 4.000 B
3 1.000 C
4 0.001 D
5 1.000 C
6 4.000 B
7 4.001 B
8 1.000 C
9 8.001 A

Thank you.

Upvotes: 0

Views: 2246

Answers (4)

Samuel Reuther
Samuel Reuther

Reputation: 109

I would try this, not shure about the runtime though:

library(forcats)
df = data.frame(a = c("0", "0.001", "1", "4", "4.001", "8.001"))
df$b <- fct_recode(df$a,
               X = "0",
               D = "0.001",
               C = "1",
               B = "4",
               B = "4.001",
               A = "8.001")

enter image description here

Upvotes: 0

akrun
akrun

Reputation: 887138

The easiest way would be to create a key/value dataset and join with the original data

keyval <- data.frame(a = c(0, 0.001, 1, 4, 4.001, 8.001), 
     b = c('x', 'D', 'C', 'B', 'B', 'A'), stringsAsFactors= FALSE)
library(data.table)
setDT(df1)[keyval, b := b, on = .(a)]
df1
#       a b
#1: 0.000 x
#2: 4.000 B
#3: 1.000 C
#4: 0.001 D
#5: 1.000 C
#6: 4.000 B
#7: 4.001 B
#8: 1.000 C
#9: 8.001 A

data

df1 <- structure(list(a = c(0, 4, 1, 0.001, 1, 4, 4.001, 1, 8.001)), 
    .Names = "a", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9"), class = "data.frame")

Upvotes: 2

Drj
Drj

Reputation: 1256

I do not believe there is a single line command that can do it for you. BTW for loops by nature are inefficient and not recommended for large data sets.

Option 1:
What you may want to try is logical indexing which is a statistical implementation of bit array.

idx<- df$a == "0.000"
df$NewColumn[idx] <- "x"

idx<- df$a == "4.000"
df$NewColumn[idx] <- "B"

and so on and so forth...

Option 2:
Use plyr and revalue which is a simpler implementation however could be more compute intensive than option 1. Should still easily work for your data size.

library(plyr)
df$NewColumn <- revalue(df$a, c(0 = "x", 0.001 = "D", 1 = "C", 4 = "B", 4.001 = "B", and 8.001 = "A"))

For either option, make sure that the data type class is provided correctly. From your example, its hard for me to tell if the data is factor or numeric but either ways, its a simple change to manage in my sample code.

Upvotes: 1

Krina M
Krina M

Reputation: 155

Try as.factor (x, levels=c (whatever levels and values separated by comma))

Upvotes: 0

Related Questions