Averaging specific row values to create new column in R

Question

I am trying to combine specific questions to form new combined questions with values that are the average of the questions that were combined. I only want the average value for a specific id. In the example below, I am trying to combine questions 1 and 2 (abc and def) and then generate a column that includes the average of the two values for each participant (indicated by id).

This is an example of what the original dataframe looks like:

id  question  qnumber  value
1   abc       1        1
1   def       2        3
1   ghi       3        4
2   abc       1        2
2   def       2        4
2   ghi       3        1

This is what I would like the dataframe to look like.

id  question  qnumber  value
1   abcdef    1        2
1   ghi       3        4
2   abcdef    1        3
2   ghi       3        1

In my actual dataset, I have 17 questions and would like to combine 3 pairs, yielding 14 questions (11 independent and 3 from the combined questions). I do not care if the resulting "question" column has the question names combined in the same style as above, but I thought this would make things easier to understand. The qnumber column isn't very important, but I wasn't sure if it would be easier to combine certain rows on the basis of a number (as in "qnumber") as opposed to on the basis of a string (as in "question"), so I included it.

Julia Silge · Accepted Answer

I would use the ever useful case_when() from dplyr to take care of that.

library(tidyverse)


df <- tribble(~id,  ~question,  ~qnumber,  ~value,
              1,   "abc",       1,        1,
              1,   "def",       2,        3,
              1,   "ghi",       3,        4,
              2,   "abc",       1,        2,
              2,   "def",       2,        4,
              2,   "ghi",       3,        1)

df %>%
    mutate(question = case_when(question %in% c("abc",
                                                "def") ~ "abcdef",
                                TRUE ~ question)) %>%
    group_by(id, question) %>%
    summarise(value = mean(value)) %>%
    ungroup
#> # A tibble: 4 x 3
#>      id question value
#>        
#> 1    1. abcdef      2.
#> 2    1. ghi         4.
#> 3    2. abcdef      3.
#> 4    2. ghi         1.

Created on 2018-04-26 by the reprex package (v0.2.0).

Averaging specific row values to create new column in R

Answers (2)

Related Questions