Sigma
Sigma

Reputation: 175

Creating a function in R that transforms strings into integers, over the entire dataframe

I need to create a function in R that transforms all entries of a dataframe, which are character strings, into integers, according to a previouly determined "translation code".

Sample of input data:

Question 1          Question 2    Question 3

Strongly Agree      Agree         Disagree

Strongly Disagree   Neutral       Don't Know

The dataset I'll work with will have over 1000 lines and 50 columns. Each answer needs to be translated into an integer value. The formula for translation is:

Strongly disagree = 1, Disagree = 2, Neutral = 3, Agree = 4, Strongly agree = 5, Don't know = 0.

So the function output over this sample data would be

Question 1  Question 2  Question 3

5           4           2

1           3           0

My function attempt:

transform <- function(x)

{
  for (i in x[i, ]

  {
  if (i == 'Discordo fortemente')  {i == 1}
  if (i == 'Discordo')  {i == 2}
  if (i == 'Não concordo nem discordo') {i == 3}
  if (i == 'Concordo')  {i == 4}
  if (i == 'Concordo fortemente')  {i == 5}
  if (i == 'Não sei dizer')  {i == 0}
  }

}

The language above is portuguese. Obviously the code doesn't work and I have been banging my head against the wall for nearly 2 hours. Any solution to my problem is welcome, although my idea is to build a function that works for one column, then use it with lapply.

Upvotes: 1

Views: 65

Answers (4)

moodymudskipper
moodymudskipper

Reputation: 47320

If you had consistent case you could do just :

mapping <- c(`Strongly disagree` = 1, Disagree = 2, Neutral = 3, Agree = 4,
  `Strongly agree` = 5, `Don't know` = 0.)

df[] <- lapply(df, function(x) mapping[x])

or

df[] <- mapping[unlist(df)]

Because you don't, you can do:

mapping <- setNames(mapping,toupper(names(mapping)))
df[] <- lapply(df, function(x) mapping[toupper(x)])
df
#   Question.1 Question.2 Question.3
# 1          5          4          2
# 2          1          3          0

or

df[] <- mapping[toupper(unlist(df))] # (same output)

data

df <- read.table(header=TRUE,stringsAsFactors=FALSE,text="
'Question 1'          'Question 2'    'Question 3'
'Strongly Agree'      Agree         Disagree
'Strongly Disagree'   Neutral       'Don\\'t Know'")

Upvotes: 1

Noah
Noah

Reputation: 4414

for (i in colnames(x)) {
  x[,i] <- sapply(x[,i], function(j) switch(j,
                   "Discordo fortemente" = 1,
                   "Discordo" = 2,
                   "Não concordo nem discordo" = 3,
                   "Concordo" = 4,
                   "Concordo fortemente" = 5,
                   0))
}

This approach uses base R if you don't want to learn dplyr, but can get unweildly in general.

Upvotes: 1

Cettt
Cettt

Reputation: 11981

I would recommend using a case_when function. For example

library(dplyr)
x %>& 
 mutate_all(~case_when(.x == 'Discordo fortemente' ~ 1,
                       .x == 'Discordo' ~ 2, 
                       .x == 'Não concordo nem discordo' ~ 3, 
                       .x == 'Concordo' ~ 4, 
                       .x == 'Concordo fortemente' ~ 5, 
                       .x == 'Não sei dizer' ~ 0))

Here, x is your data. This code modifies all columns. If you have other columns which you do not want to transform you can use the mutate_at instead of mutate_all function.

If you want to make your code work you have to modify as follows:

transform <- function(x) {

  y <- seq_along(x)

  for (i in 1:length(x)) {
    if (x[i] == 'Discordo fortemente')  {y[i] = 1}
    if (x[i] == 'Discordo')  {y[i] = 2}
    if (x[i] == 'Não concordo nem discordo') {y[i] = 3}
    if (x[i] == 'Concordo')  {y[i] = 4}
    if (x[i] == 'Concordo fortemente')  {y[i] = 5}
    if (x[i] == 'Não sei dizer')  {y[i]= 0}
}

  return(y)
}

transform(c("Discordo", 'Concordo fortemente', 'Não sei dizer'))
[1] 2 5 0

Upvotes: 3

FALL Gora
FALL Gora

Reputation: 481

why not this:

library(dplyr)
transform_fct <- function(var) {
  case_when(
    var == "Strongly disagree" ~  1,
    var == "Disagree" ~ 2,
    var == "Neutral" ~ 3,
    var == "Agree" ~ 4,
    var == "Strongly agree" ~ 5,
    var == "Don't know" ~ 0
  )
}
x <- x %>%
  mutate_all(transform_fct)

Upvotes: 2

Related Questions