Reputation: 11697
Let's say I have the following "key" dataset:
key = read.table(text = "question r_answer d_answer
20 A B
21 B A
22 A B
23 B A
24 A B
25 B A", header = T)
> key
question r_answer d_answer
1 20 A B
2 21 B A
3 22 A B
4 23 B A
5 24 A B
6 25 B A
This tells me, for a given question, what answer an "R" would give, and what answer a "D" would give.
Now let's say this is the dataset:
data = read.table(text = "person_id question answer
1 20 A
1 21 B
1 22 A
1 23 B
1 24 A
1 25 B
2 20 A
2 21 A
2 23 A
2 24 B
2 25 B", header = T)
>data
person_id question answer
1 1 20 A
2 1 21 B
3 1 22 A
4 1 23 B
5 1 24 A
6 1 25 B
7 2 20 A
8 2 21 A
9 2 23 A
10 2 24 B
11 2 25 B
This tells me, for a given person, what their actual answers were. I want to create an answer_type
column in data that is equal to either r_answer
or d_answer
depending on the values listed in key. The resulting output would be:
person_id question answer answer_type
1 1 20 A r_answer
2 1 21 B r_answer
3 1 22 A r_answer
4 1 23 B r_answer
5 1 24 A r_answer
6 1 25 B r_answer
7 2 20 A r_answer
8 2 21 A d_answer
9 2 23 A d_answer
10 2 24 B d_answer
11 2 25 B r_answer
I have a feeling that the answer will involve merging from dplyr, but I can't quite figure it out.
Upvotes: 5
Views: 902
Reputation: 101956
Here is a base R solution
data$answer_type <- apply(data, 1, function(v) {
colnames(key)[which(key[match(v["question"],key$question),] == v["answer"])]
})
such that
> data
person_id question answer answer_type
1 1 20 A r_answer
2 1 21 B r_answer
3 1 22 A r_answer
4 1 23 B r_answer
5 1 24 A r_answer
6 1 25 B r_answer
7 2 20 A r_answer
8 2 21 A d_answer
9 2 23 A d_answer
10 2 24 B d_answer
11 2 25 B r_answer
Upvotes: 1
Reputation: 23574
If you want the same approach with data.table. You can do the following. melt()
is working like pivot_longer()
in order to convert key
to a long-format data. I created columns names in the way that I can join the two data sets easily. Then, you merge\join with data
.
mykey <- melt(setDT(key), id.vars = "question", measure.vars = patterns("answer"),
variable.name = "answer_type", value.name = "answer")
mykey[setDT(data), on = c("question", "answer")]
question answer_type answer person_id
1: 20 r_answer A 1
2: 21 r_answer B 1
3: 22 r_answer A 1
4: 23 r_answer B 1
5: 24 r_answer A 1
6: 25 r_answer B 1
7: 20 r_answer A 2
8: 21 d_answer A 2
9: 23 d_answer A 2
10: 24 d_answer B 2
11: 25 r_answer B 2
Upvotes: 2
Reputation: 887391
We can do
library(dplyr)
library(tidyr)
data %>%
group_split(person_id) %>%
map_dfr(~ .x %>%
left_join(pivot_longer(key, cols = -question),
by = c('question', 'answer' = 'value')))
# A tibble: 11 x 4
# person_id question answer name
# <int> <int> <fct> <chr>
# 1 1 20 A r_answer
# 2 1 21 B r_answer
# 3 1 22 A r_answer
# 4 1 23 B r_answer
# 5 1 24 A r_answer
# 6 1 25 B r_answer
# 7 2 20 A r_answer
# 8 2 21 A d_answer
# 9 2 23 A d_answer
#10 2 24 B d_answer
#11 2 25 B r_answer
Upvotes: 4
Reputation: 39868
One dplyr
and tidyr
option could be:
data %>%
left_join(key %>%
pivot_longer(-question, names_to = "answer_type"), by = c("question" = "question",
"answer" = "value"))
person_id question answer answer_type
1 1 20 A r_answer
2 1 21 B r_answer
3 1 22 A r_answer
4 1 23 B r_answer
5 1 24 A r_answer
6 1 25 B r_answer
7 2 20 A r_answer
8 2 21 A d_answer
9 2 23 A d_answer
10 2 24 B d_answer
11 2 25 B r_answer
Upvotes: 5