CougarFan
CougarFan

Reputation: 1

Renaming/Recoding variables efficiently

I'm using R and have a dataset with ~3000 psychological test data. Most of the data is stored as string variables:

> table(rona_full$FQ169_6)

             At no time        Some of the time Less than half the time More than half the time 
                    418                      73                      48                      36 
       Most of the time         All of the time 
                     20                       9 

I want to recode the string data into variables as such:

> table(rona_full$FQ169_6)

  0   1   2   3   4   5 
443  63  39  30  21   9 

Currently, this is my approach:

rona_full$FQ169_6 <-ifelse(rona_full$MQ169_6 == "At no time", 0, #this recodes MDI from string
                    ifelse(rona_full$MQ169_6 == "Some of the time", 1,
                    ifelse(rona_full$MQ169_6 == "Less than half the time", 2,
                    ifelse(rona_full$MQ169_6 == "More than half the time", 3,
                    ifelse(rona_full$MQ169_6 == "Most of the time", 4, 5)))))

I imagine there is a more efficient way to perform this exact same recoding on 25+ columns rather than doing each one this way.

Upvotes: 0

Views: 511

Answers (4)

Ankit Sagar
Ankit Sagar

Reputation: 91

This code will take unique values in a column and assign them a unique number serially:

rona_full$FQ169_6 = as.character(as.integer(factor(rona_full$FQ169_6, levels = unique(rona_full$FQ169_6))))

Upvotes: 0

Arthur Welle
Arthur Welle

Reputation: 698

{data.table} is really efficient. You can use a custom function to be passed to a desired number of columns. For me this method is flexible and scalable.

install.packages("data.table")
library(data.table)

# dummy data
d <- data.table(id = 1:11, x = LETTERS[1:10], y = LETTERS[5:15])

# make a function suited to your needs
f <- function(x){
      as.numeric(
          fcase(
           x == "A", "1",
           x == "B", "2",
           x == "E", "5",
           x == "F", "6"
           ))}


# list of columns to where the function will be called
l_cols <- c("x", "y")

# test what you want creating new modified columns
#d[, c(paste0("new_",l_cols)) := lapply(.SD, f), .SDcols = l_cols]

# to change by reference your variables: no turning back this time
d[, c(l_cols) := lapply(.SD, f), .SDcols = l_cols]

EDIT: Maybe it would be more clear what´s going on if I use your data:

install.packages("data.table")
library(data.table)

# transform your data.frame to a data.table
d <- data.table::as.data.table(rona_full)

# make a function suited to your needs
f <- function(x){
      as.numeric(
          data.table::fcase(
           x == "At no time", "0",
           x == "Some of the time", "1",
           x == "Less than half the time", "2",
           x == "More than half the time", "3",
           x == "Most of the time", "4"
           ))}

# list of columns to where the function will be called (assuming MQ169_7)
l_cols <- c("MQ169_6", "MQ169_7")
   
# to change by reference your variables
d[, c(l_cols) := lapply(.SD, f), .SDcols = l_cols]

# confirm if it did work
head(d$MQ169_6)

Upvotes: 1

Marcelo Avila
Marcelo Avila

Reputation: 2374

Arthur's answer is a very good solution using the {data.table} package. In case you are more familiar to the {tidyverse} way-of-life, here is another option.

reprex first

First lets create a random data that looks like a bit like yours. I assume by looking at your current table output that your data is read as factor. You can check with is.factor().

categs <- c("At no time",
            "Some of the time",
            "Less than half the time",
            "More than half the time",
            "Most of the time", 
            "All of the time")

n <- 10
df <- data.frame(
  y   = rnorm(n),
  mq_1 = sample(categs, size = n, replace = T), 
  mq_2 = sample(categs, size = n, replace = T),
  mq_3 = sample(categs, size = n, replace = T)
) %>% 
  mutate(across(starts_with("mq"), factor))

Solution

If the naming of your variable follows a certain logic, you can use starts_with() to select the variables to transform, otherwise, create a vector with the variable names.

vars_to_transf <- c("mq_1", "mq_2", "mq_3")

df_transf <- df %>% mutate(
  across(starts_with("mq_"), .fns = as.numeric) #note: function without parenthesis
)
head(df_transf)
table(df_transf$mq_1)

More generic option

you can also pass a custom function to the mutate(across(.).) functions:

f_trans_factor <- function(x) {
  dplyr::case_when(
    x == "At no time"               ~ 0,
    x == "Some of the time"         ~ 1,
    x == "Less than half the time"  ~ 2,
    x == "More than half the time"  ~ 3,
    x == "Most of the time"         ~ 4,
    x == "All of the time"          ~ 5,
    TRUE                            ~ NA_real_
  )
}
df %>% 
  mutate(
    across(vars_to_transf, .fns = f_trans_factor) #note: function without parenthesis
  )

Upvotes: 1

tlhenvironment
tlhenvironment

Reputation: 349

I take it that you want to transform your dataset to factors, and you use the hand-made approach :D

You can get the factor levels by assessing the @.Data slot of a factor variable.

c("Foo", "Foo", "Bar") -> strings
as.factor(strings) -> factors
[email protected]

Upvotes: 0

Related Questions