HCAI
HCAI

Reputation: 2263

Convert multiple columns to factor and give them numerical values

I have a data frame that contains about a 100 columns called some_text_microorganism_growth. They are characters but really are an ordered factor (NG<SG<LG<MG<HG) with an equivalent numerical value of (0,2.5,6,12,25,40). I can convert these column by column but I need to use: contains("growth") to do it for all the columns. Any thoughts?

Edited data:

df<-data.frame(ABC_growth=rep(c("MG","LG","NG"), each=5), ZFG_growth=rep(c("GG","LG","SG"),each=5),OtherCol=rep(c("AB*","CD;","other+"),each=5)

#Note not all factors appear in each column but they are common across all columns. The full set is: (NG<SG<LG<MG<HG)

For one column I do:

df$ABC_growth<-factor(dfH$ABC_growth) #convert to factor
df$ABC_growth <-ordered(dfH$ABC_growth,levels= c("SG","LG","MG","HG")) # order
levels(df$ABC_growth) <- c("2.5","12","40","100")

What do you think?

Upvotes: 2

Views: 804

Answers (2)

Eric
Eric

Reputation: 2849

Here is a data.table approach using lapply which calls the factor function once for each element. levels and labels are used to set the unique factor values.


df <- data.frame(ABC_growth=rep(c("MG","LG","NG"), each=5),
                 ZFG_growth=rep(c("GG","LG","SG"),each=5),
                 test = rep(c("GG","LG","SG"),each=5))

library(data.table)

# Coerce data.frame to data.table object

setDT(df)

# Original with all variables including new variable named test

print(df)

#>     ABC_growth ZFG_growth test
#>  1:         MG         GG   GG
#>  2:         MG         GG   GG
#>  3:         MG         GG   GG
#>  4:         MG         GG   GG
#>  5:         MG         GG   GG
#>  6:         LG         LG   LG
#>  7:         LG         LG   LG
#>  8:         LG         LG   LG
#>  9:         LG         LG   LG
#> 10:         LG         LG   LG
#> 11:         NG         SG   SG
#> 12:         NG         SG   SG
#> 13:         NG         SG   SG
#> 14:         NG         SG   SG
#> 15:         NG         SG   SG

# Use grep to extract the variable names that match the provided pattern

cols <- grep('growth', names(df))

df[, lapply(.SD, function(x) factor(x,
  levels = c("NG", "SG", "LG", "MG", "HG"),
  labels = c('0', '2.5', '12', '40', '100')
))][, ..cols] 

#>     ABC_growth ZFG_growth
#>  1:         40       <NA>
#>  2:         40       <NA>
#>  3:         40       <NA>
#>  4:         40       <NA>
#>  5:         40       <NA>
#>  6:         12         12
#>  7:         12         12
#>  8:         12         12
#>  9:         12         12
#> 10:         12         12
#> 11:          0        2.5
#> 12:          0        2.5
#> 13:          0        2.5
#> 14:          0        2.5
#> 15:          0        2.5

Created on 2021-03-16 by the reprex package (v0.3.0)

Upvotes: 2

akrun
akrun

Reputation: 887118

We can use mutate with across

df <- df %>% 
  mutate(across(contains('growth'), ~ ordered(.,
      levels = c("NG", "SG", "LG", "MG", "HG"), 
       labels = c('0', '2.5', '12', '40', '100'))))

Or with lapply in base R

nm1 <- grep('growth', names(df), value = TRUE)
df[nm1] <- lapply(df[nm1], function(x)  ordered(x, 
   levels = c("NG", "SG", "LG", "MG", "HG"), 
       labels = c('0', '2.5', '12', '40', '100')))

Or this can be also done with ftransform (ftransformv - for multiple columns) from collapse

library(collapse)
f1 <- function(x)  {
      ordered(x, levels = c("NG", "SG", "LG", "MG", "HG"), 
         labels = c('0', '2.5', '12', '40', '100'))
 }

i1 <- grep('growth', names(df))
ftransformv(df, i1, f1)

-output

#   ABC_growth ZFG_growth
#1          40       <NA>
#2          40       <NA>
#3          40       <NA>
#4          40       <NA>
#5          40       <NA>
#6          12         12
#7          12         12
#8          12         12
#9          12         12
#10         12         12
#11          0        2.5
#12          0        2.5
#13          0        2.5
#14          0        2.5
#15          0        2.5

Upvotes: 2

Related Questions