Rose
Rose

Reputation: 27

How can I write a function to take the mean in a survey?

Hello I am attempting to write a code/function that can give me the mean/average of a set of scores for each data sets, but I need some help.

Below is an example. I need this function to give me the sum of the scores. The participant could select on a scale of 0-101. But for numbers 2, 5 , and 6. I need to take the reverse score. Lastly I need to take the mean. So the score divided by 6. The expected outcome (the answer) the score = 226 and the mean is 37.667

data Question Scores
1 ''I feel tense'' 76
2 ''I am calm'' 90
3 ''I am excited'' 52
4 ''I am worried'' 65
5 ''I am satisfied'' 90
6 ''I am relaxed'' 90

The problem is I need to code to be universal so I can use it for each data set I have. Because the questions for each participant is given in random order. For example (number one ''I feel tense'') can be in the 5 position and so forth. Thus I am thinking I need a if else statement, but I am a total beginner in R. maybe (if ''I am calm'', do this else do this etc.? )

I have a code that I wrote that works! but it is not very universal because I have to go in a change each data set to match the questions. I would really appreciate some insight or help writing this function.


My code that works for one data set, its kinda long and bad looking. Also its based on the actual data I am working on in Excell that's why its named different.(Data c and data w is just giving me access to the specific places I wanted )

Data_c = DATA1$choice #Change here! 
Data_w = DATA2$word

I have to Change each Question based on if I need the score to be reversed or not. I commented out what I do not use.

Q1 = abs(Data_c[1]-101)
#Q1 = (Data_c[1])

#Q2 = abs(Data_c[2]-101)
Q2 = (Data_c[2])

Q3 = abs(Data_c[3]-101)
#Q3 = (Data_c[3])

Q4 = abs(Data_c[4]-101)
#Q4 = (Data_c[4])

#Q5 = abs(Data_c[5]-101)
Q5 = (Data_c[5])

#Q6 = abs(Data_c[6]-101)
Q6 = (Data_c[6])

df1 <- data.frame(Questions = Data_w,scores = c(Q1, Q2, Q3, Q4, Q5, Q6))

sum = df1$scores[1]+df1$scores[2]+df1$scores[3]+
df1$scores[4]+df1$scores[5]+df1$scores[6]

A = mean(sum/6)

Thank you in advance for reading this and insight

Upvotes: 1

Views: 56

Answers (2)

Mohan Govindasamy
Mohan Govindasamy

Reputation: 906

I created two functions one takes the question numbers for reversing and another take the actual questions for reversing. With these functions, you can automate your code.

library(tidyverse)

df <- read.table(text = "data Question Scores
1 'I feel tense' 76
2 'I am calm' 90
3 'I am excited' 52
4 'I am worried' 65
5 'I am satisfied' 90
6 'I am relaxed' 90", header = TRUE)

mean_of_scores_num <- function(df, reverse_data){
  df %>% 
    filter(data %in% reverse_data) %>% 
    pull(Scores) %>% 
    append(101 -df %>% 
             filter(!data %in% reverse_data) %>% 
             pull(Scores)) %>% 
    mean()
}

mean_of_scores_ques <- function(df, reverse_question){
  df %>% 
    filter(Question %in% reverse_question) %>% 
    pull(Scores) %>% 
    append(101 -df %>% 
             filter(!Question %in% reverse_question) %>% 
             pull(Scores)) %>% 
    mean()
}

mean_of_scores_num(df, c(1,3,5))
#> [1] 46

mean_of_scores_ques(df, c('I am relaxed', 'I am worried'))
#> [1] 41.83333

Created on 2021-02-04 by the reprex package (v0.3.0)

Upvotes: 2

nya
nya

Reputation: 2250

If I understand this correctly, you need a key that will define, whether the statement score is reversed. Note that the order of the responses is irrelevant in the key, but all possible responses from dat must be represented in key.

key <- read.table(text = " Question Reversed
1 'I am calm' FALSE
2 'I am excited' TRUE
3 'I feel tense' TRUE
4 'I am worried' TRUE
5 'I am satisfied' FALSE
6 'I am relaxed' FALSE", header = TRUE)

Then you need to select the correct order of the responses with respect to the key. This can be done with the match function.

dat <- read.table(text = "data Question Scores
1 'I feel tense' 76
2 'I am calm' 90
3 'I am excited' 52
4 'I am worried' 65
5 'I am satisfied' 90
6 'I am relaxed' 90", header = TRUE)

dat$Reversed = key$Reversed[match(dat$Question, key$Question)]

Since a TRUE/FALSE variable is essentially a 1/0, we can use that to calculate the mean directly from the product of the two alternative functions.

res <- sum(dat$Score * dat$Reversed, (101 - dat$Score) * !dat$Reversed) / nrow(dat)

The exclamation mark in the (101 - dat$Score) * !dat$Reversed) is important, because it negates the TRUE/FALSE indicators.

Upvotes: 1

Related Questions