Reputation: 2263
I have a data frame that contains about a 100 columns called some_text_microorganism_growth
. They are characters but really are an ordered factor (NG<SG<LG<MG<HG
) with an equivalent numerical value of (0,2.5,6,12,25,40)
. I can convert these column by column but I need to use: contains("growth") to do it for all the columns. Any thoughts?
Edited data:
df<-data.frame(ABC_growth=rep(c("MG","LG","NG"), each=5), ZFG_growth=rep(c("GG","LG","SG"),each=5),OtherCol=rep(c("AB*","CD;","other+"),each=5)
#Note not all factors appear in each column but they are common across all columns. The full set is: (NG<SG<LG<MG<HG)
For one column I do:
df$ABC_growth<-factor(dfH$ABC_growth) #convert to factor
df$ABC_growth <-ordered(dfH$ABC_growth,levels= c("SG","LG","MG","HG")) # order
levels(df$ABC_growth) <- c("2.5","12","40","100")
What do you think?
Upvotes: 2
Views: 804
Reputation: 2849
Here is a data.table
approach using lapply
which calls the factor
function once for each element. levels
and labels
are used to set the unique factor values.
df <- data.frame(ABC_growth=rep(c("MG","LG","NG"), each=5),
ZFG_growth=rep(c("GG","LG","SG"),each=5),
test = rep(c("GG","LG","SG"),each=5))
library(data.table)
# Coerce data.frame to data.table object
setDT(df)
# Original with all variables including new variable named test
print(df)
#> ABC_growth ZFG_growth test
#> 1: MG GG GG
#> 2: MG GG GG
#> 3: MG GG GG
#> 4: MG GG GG
#> 5: MG GG GG
#> 6: LG LG LG
#> 7: LG LG LG
#> 8: LG LG LG
#> 9: LG LG LG
#> 10: LG LG LG
#> 11: NG SG SG
#> 12: NG SG SG
#> 13: NG SG SG
#> 14: NG SG SG
#> 15: NG SG SG
# Use grep to extract the variable names that match the provided pattern
cols <- grep('growth', names(df))
df[, lapply(.SD, function(x) factor(x,
levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100')
))][, ..cols]
#> ABC_growth ZFG_growth
#> 1: 40 <NA>
#> 2: 40 <NA>
#> 3: 40 <NA>
#> 4: 40 <NA>
#> 5: 40 <NA>
#> 6: 12 12
#> 7: 12 12
#> 8: 12 12
#> 9: 12 12
#> 10: 12 12
#> 11: 0 2.5
#> 12: 0 2.5
#> 13: 0 2.5
#> 14: 0 2.5
#> 15: 0 2.5
Created on 2021-03-16 by the reprex package (v0.3.0)
Upvotes: 2
Reputation: 887118
We can use mutate
with across
df <- df %>%
mutate(across(contains('growth'), ~ ordered(.,
levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100'))))
Or with lapply
in base R
nm1 <- grep('growth', names(df), value = TRUE)
df[nm1] <- lapply(df[nm1], function(x) ordered(x,
levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100')))
Or this can be also done with ftransform
(ftransformv
- for multiple columns) from collapse
library(collapse)
f1 <- function(x) {
ordered(x, levels = c("NG", "SG", "LG", "MG", "HG"),
labels = c('0', '2.5', '12', '40', '100'))
}
i1 <- grep('growth', names(df))
ftransformv(df, i1, f1)
-output
# ABC_growth ZFG_growth
#1 40 <NA>
#2 40 <NA>
#3 40 <NA>
#4 40 <NA>
#5 40 <NA>
#6 12 12
#7 12 12
#8 12 12
#9 12 12
#10 12 12
#11 0 2.5
#12 0 2.5
#13 0 2.5
#14 0 2.5
#15 0 2.5
Upvotes: 2