Reputation: 1000
I am trying to combine specific questions to form new combined questions with values that are the average of the questions that were combined. I only want the average value for a specific id. In the example below, I am trying to combine questions 1 and 2 (abc and def) and then generate a column that includes the average of the two values for each participant (indicated by id).
This is an example of what the original dataframe looks like:
id question qnumber value
1 abc 1 1
1 def 2 3
1 ghi 3 4
2 abc 1 2
2 def 2 4
2 ghi 3 1
This is what I would like the dataframe to look like.
id question qnumber value
1 abcdef 1 2
1 ghi 3 4
2 abcdef 1 3
2 ghi 3 1
In my actual dataset, I have 17 questions and would like to combine 3 pairs, yielding 14 questions (11 independent and 3 from the combined questions). I do not care if the resulting "question" column has the question names combined in the same style as above, but I thought this would make things easier to understand. The qnumber column isn't very important, but I wasn't sure if it would be easier to combine certain rows on the basis of a number (as in "qnumber") as opposed to on the basis of a string (as in "question"), so I included it.
Upvotes: 1
Views: 1453
Reputation: 11663
I would use the ever useful case_when()
from dplyr to take care of that.
library(tidyverse)
df <- tribble(~id, ~question, ~qnumber, ~value,
1, "abc", 1, 1,
1, "def", 2, 3,
1, "ghi", 3, 4,
2, "abc", 1, 2,
2, "def", 2, 4,
2, "ghi", 3, 1)
df %>%
mutate(question = case_when(question %in% c("abc",
"def") ~ "abcdef",
TRUE ~ question)) %>%
group_by(id, question) %>%
summarise(value = mean(value)) %>%
ungroup
#> # A tibble: 4 x 3
#> id question value
#> <dbl> <chr> <dbl>
#> 1 1. abcdef 2.
#> 2 1. ghi 4.
#> 3 2. abcdef 3.
#> 4 2. ghi 1.
Created on 2018-04-26 by the reprex package (v0.2.0).
Upvotes: 3
Reputation: 10671
Here is one way to tackle it, using tidyverse
and a lookup vector to handle the question combining.
dat <- read.table(text = "id question qnumber value
1 abc 1 1
1 def 2 3
1 ghi 3 4
2 abc 1 2
2 def 2 4
2 ghi 3 1", header = T, stringsAsFactors = FALSE)
library(tidyverse)
# set up named vector for combining (flexible in the real world with more pairs)
lu <- rep("abcdef", 2) %>% set_names(c("abc", "def"))
dat %>% mutate(new_question = ifelse(question %in% names(lu), # add in new column for combination
lu[question],
question)) %>%
group_by(new_question, id) %>% # group via relevant columns: id and new_question
summarise(avg_val = mean(value)) # calculate your averages
# A tibble: 4 x 3
# Groups: new_question [?]
new_question id avg_val
<chr> <int> <dbl>
1 abcdef 1 2
2 abcdef 2 3
3 ghi 1 4
4 ghi 2 1
Here I dropped the qnumber
since it wasn't critical and it might make more sense to add it back after summarising.
Upvotes: 3