Reputation: 1
I'm using R and have a dataset with ~3000 psychological test data. Most of the data is stored as string variables:
> table(rona_full$FQ169_6)
At no time Some of the time Less than half the time More than half the time
418 73 48 36
Most of the time All of the time
20 9
I want to recode the string data into variables as such:
> table(rona_full$FQ169_6)
0 1 2 3 4 5
443 63 39 30 21 9
Currently, this is my approach:
rona_full$FQ169_6 <-ifelse(rona_full$MQ169_6 == "At no time", 0, #this recodes MDI from string
ifelse(rona_full$MQ169_6 == "Some of the time", 1,
ifelse(rona_full$MQ169_6 == "Less than half the time", 2,
ifelse(rona_full$MQ169_6 == "More than half the time", 3,
ifelse(rona_full$MQ169_6 == "Most of the time", 4, 5)))))
I imagine there is a more efficient way to perform this exact same recoding on 25+ columns rather than doing each one this way.
Upvotes: 0
Views: 511
Reputation: 91
This code will take unique values in a column and assign them a unique number serially:
rona_full$FQ169_6 = as.character(as.integer(factor(rona_full$FQ169_6, levels = unique(rona_full$FQ169_6))))
Upvotes: 0
Reputation: 698
{data.table} is really efficient. You can use a custom function to be passed to a desired number of columns. For me this method is flexible and scalable.
install.packages("data.table")
library(data.table)
# dummy data
d <- data.table(id = 1:11, x = LETTERS[1:10], y = LETTERS[5:15])
# make a function suited to your needs
f <- function(x){
as.numeric(
fcase(
x == "A", "1",
x == "B", "2",
x == "E", "5",
x == "F", "6"
))}
# list of columns to where the function will be called
l_cols <- c("x", "y")
# test what you want creating new modified columns
#d[, c(paste0("new_",l_cols)) := lapply(.SD, f), .SDcols = l_cols]
# to change by reference your variables: no turning back this time
d[, c(l_cols) := lapply(.SD, f), .SDcols = l_cols]
EDIT: Maybe it would be more clear what´s going on if I use your data:
install.packages("data.table")
library(data.table)
# transform your data.frame to a data.table
d <- data.table::as.data.table(rona_full)
# make a function suited to your needs
f <- function(x){
as.numeric(
data.table::fcase(
x == "At no time", "0",
x == "Some of the time", "1",
x == "Less than half the time", "2",
x == "More than half the time", "3",
x == "Most of the time", "4"
))}
# list of columns to where the function will be called (assuming MQ169_7)
l_cols <- c("MQ169_6", "MQ169_7")
# to change by reference your variables
d[, c(l_cols) := lapply(.SD, f), .SDcols = l_cols]
# confirm if it did work
head(d$MQ169_6)
Upvotes: 1
Reputation: 2374
Arthur's answer is a very good solution using the {data.table}
package. In case you are more familiar to the {tidyverse}
way-of-life, here is another option.
First lets create a random data that looks like a bit like yours. I assume by
looking at your current table output that your data is read as factor. You can check
with is.factor()
.
categs <- c("At no time",
"Some of the time",
"Less than half the time",
"More than half the time",
"Most of the time",
"All of the time")
n <- 10
df <- data.frame(
y = rnorm(n),
mq_1 = sample(categs, size = n, replace = T),
mq_2 = sample(categs, size = n, replace = T),
mq_3 = sample(categs, size = n, replace = T)
) %>%
mutate(across(starts_with("mq"), factor))
If the naming of your variable follows a certain logic, you can use
starts_with()
to select the variables to transform, otherwise, create a vector with
the variable names.
vars_to_transf <- c("mq_1", "mq_2", "mq_3")
df_transf <- df %>% mutate(
across(starts_with("mq_"), .fns = as.numeric) #note: function without parenthesis
)
head(df_transf)
table(df_transf$mq_1)
you can also pass a custom function to the mutate(across(.).)
functions:
f_trans_factor <- function(x) {
dplyr::case_when(
x == "At no time" ~ 0,
x == "Some of the time" ~ 1,
x == "Less than half the time" ~ 2,
x == "More than half the time" ~ 3,
x == "Most of the time" ~ 4,
x == "All of the time" ~ 5,
TRUE ~ NA_real_
)
}
df %>%
mutate(
across(vars_to_transf, .fns = f_trans_factor) #note: function without parenthesis
)
Upvotes: 1
Reputation: 349
I take it that you want to transform your dataset to factors, and you use the hand-made approach :D
You can get the factor levels by assessing the @.Data slot of a factor variable.
c("Foo", "Foo", "Bar") -> strings
as.factor(strings) -> factors
[email protected]
Upvotes: 0