Reputation: 175
I need to create a function in R that transforms all entries of a dataframe, which are character strings, into integers, according to a previouly determined "translation code".
Sample of input data:
Question 1 Question 2 Question 3
Strongly Agree Agree Disagree
Strongly Disagree Neutral Don't Know
The dataset I'll work with will have over 1000 lines and 50 columns. Each answer needs to be translated into an integer value. The formula for translation is:
Strongly disagree = 1, Disagree = 2, Neutral = 3, Agree = 4, Strongly agree = 5, Don't know = 0.
So the function output over this sample data would be
Question 1 Question 2 Question 3
5 4 2
1 3 0
My function attempt:
transform <- function(x)
{
for (i in x[i, ]
{
if (i == 'Discordo fortemente') {i == 1}
if (i == 'Discordo') {i == 2}
if (i == 'Não concordo nem discordo') {i == 3}
if (i == 'Concordo') {i == 4}
if (i == 'Concordo fortemente') {i == 5}
if (i == 'Não sei dizer') {i == 0}
}
}
The language above is portuguese. Obviously the code doesn't work and I have been banging my head against the wall for nearly 2 hours. Any solution to my problem is welcome, although my idea is to build a function that works for one column, then use it with lapply.
Upvotes: 1
Views: 65
Reputation: 47320
If you had consistent case you could do just :
mapping <- c(`Strongly disagree` = 1, Disagree = 2, Neutral = 3, Agree = 4,
`Strongly agree` = 5, `Don't know` = 0.)
df[] <- lapply(df, function(x) mapping[x])
or
df[] <- mapping[unlist(df)]
Because you don't, you can do:
mapping <- setNames(mapping,toupper(names(mapping)))
df[] <- lapply(df, function(x) mapping[toupper(x)])
df
# Question.1 Question.2 Question.3
# 1 5 4 2
# 2 1 3 0
or
df[] <- mapping[toupper(unlist(df))] # (same output)
data
df <- read.table(header=TRUE,stringsAsFactors=FALSE,text="
'Question 1' 'Question 2' 'Question 3'
'Strongly Agree' Agree Disagree
'Strongly Disagree' Neutral 'Don\\'t Know'")
Upvotes: 1
Reputation: 4414
for (i in colnames(x)) {
x[,i] <- sapply(x[,i], function(j) switch(j,
"Discordo fortemente" = 1,
"Discordo" = 2,
"Não concordo nem discordo" = 3,
"Concordo" = 4,
"Concordo fortemente" = 5,
0))
}
This approach uses base R if you don't want to learn dplyr
, but can get unweildly in general.
Upvotes: 1
Reputation: 11981
I would recommend using a case_when
function. For example
library(dplyr)
x %>&
mutate_all(~case_when(.x == 'Discordo fortemente' ~ 1,
.x == 'Discordo' ~ 2,
.x == 'Não concordo nem discordo' ~ 3,
.x == 'Concordo' ~ 4,
.x == 'Concordo fortemente' ~ 5,
.x == 'Não sei dizer' ~ 0))
Here, x
is your data. This code modifies all columns.
If you have other columns which you do not want to transform you can use the mutate_at
instead of mutate_all
function.
If you want to make your code work you have to modify as follows:
transform <- function(x) {
y <- seq_along(x)
for (i in 1:length(x)) {
if (x[i] == 'Discordo fortemente') {y[i] = 1}
if (x[i] == 'Discordo') {y[i] = 2}
if (x[i] == 'Não concordo nem discordo') {y[i] = 3}
if (x[i] == 'Concordo') {y[i] = 4}
if (x[i] == 'Concordo fortemente') {y[i] = 5}
if (x[i] == 'Não sei dizer') {y[i]= 0}
}
return(y)
}
transform(c("Discordo", 'Concordo fortemente', 'Não sei dizer'))
[1] 2 5 0
Upvotes: 3
Reputation: 481
why not this:
library(dplyr)
transform_fct <- function(var) {
case_when(
var == "Strongly disagree" ~ 1,
var == "Disagree" ~ 2,
var == "Neutral" ~ 3,
var == "Agree" ~ 4,
var == "Strongly agree" ~ 5,
var == "Don't know" ~ 0
)
}
x <- x %>%
mutate_all(transform_fct)
Upvotes: 2