Reputation: 165
I have a big dataset with its format being similar to the followings:
names <- c('s1','s2','s3', 's4', 's5','s6', 's7', 's8','s9')
metals <- c(4.2, 5.3, 5.4,6, 7,8.5,0, 10.1,11)
plastics <- c(5.1, 0, 2.4,6.1, 7.7,5.5,1.99, 0 ,2.5)
grade<- c("AA", "AB", "AB", "AB", "AC" , "AB", NA , NA, NA)
my_df <- data.frame(names, metals, plastics, grade )
I need to recode each column For numeric columns I need to assign 1 where the value is greater than 0 and for the "grade" columns lets assume I want AA=1, AB=2, AC=3. what is the most efficient way to do so?
Thanks!
Upvotes: 0
Views: 231
Reputation: 93908
As always in R, there are a million ways to do even the simplest task. Here's 2 more:
numvars <- sapply(my_df, is.numeric)
my_df[numvars] <- lapply(my_df[numvars], findInterval, 1)
my_df$grade <- c(2,1,3)[match(my_df$grade, c("AB","AA","AC"))]
#newvals #oldvals
# names metals plastics grade
#1 s1 1 1 1
#2 s2 1 0 2
#3 s3 1 1 2
#4 s4 1 1 2
#5 s5 1 1 3
#6 s6 1 1 2
#7 s7 0 1 NA
#8 s8 1 0 NA
#9 s9 1 1 NA
Upvotes: 2
Reputation: 3597
Using apply
for numeric columns and match
for character column
Edited as per@thelatemail's comments to avoid intermediate matrix coercion
my_df[,sapply(my_df,is.numeric)] = lapply(my_df[,sapply(my_df,is.numeric)],function(x) ifelse(x>0,1,0))
my_df$grade = match(my_df$grade,c("AA","AB","AC"))
my_df
# names metals plastics grade
#1 s1 1 1 1
#2 s2 1 0 2
#3 s3 1 1 2
#4 s4 1 1 2
#5 s5 1 1 3
#6 s6 1 1 2
#7 s7 0 1 NA
#8 s8 1 0 NA
#9 s9 1 1 NA
There will be other solutions using data.table,dplyr soon. You could use microbenchmark
to choose the best solution
Upvotes: 1
Reputation: 18681
Going off of @MFR's answer, here are two ways to do it:
NumColsToReplace = c("metals", "plastics")
my_df[NumColsToReplace] = ifelse(my_df[NumColsToReplace] > 0, 1, 0)
This allows you to pre-specify the columns you want to replace without copying the second line a bunch of times.
There is also another more efficient way using lapply
and replace
:
my_df[NumColsToReplace] = lapply(my_df[NumColsToReplace],
function(x) replace(x, x>0, 1))
This may be more typing, but it is two times as fast (or more) as the first method. Below are some benchmarking:
Unit: microseconds
expr min
lapply(my_df[NumColsToReplace], function(x) replace(x, x > 0, 1)) 23.949
ifelse(my_df[NumColsToReplace] > 0, 1, 0) 59.445
lq mean median uq max neval
26.515 29.92362 28.654 30.364 57.306 100
62.438 68.84436 63.721 73.129 159.515 100
So depending on how large your dataframe is. You way want to consider the second method.
levels(my_df$grade) <- c(1,2,3)
to recode grade mentioned by @thelatemail seems to be the most efficient.
Upvotes: 0
Reputation: 2077
Not sure if this one is the most efficient one, but we can use recode
in car
package for the character column.
my_df$metals <- ifelse (my_df$metals > 0, 1 , 0)
my_df$plastics <- ifelse (my_df$plastics > 0, 1 , 0)
library(car)
my_df$grade<-recode(my_df$grade, "'AA'=1; 'AB'='2'; 'AC'='3'")
Output
names metals plastics grade
1 s1 1 1 1
2 s2 1 0 2
3 s3 1 1 2
4 s4 1 1 2
5 s5 1 1 3
6 s6 1 1 2
7 s7 0 1 <NA>
8 s8 1 0 <NA>
9 s9 1 1 <NA>
Upvotes: 1