Parseltongue
Parseltongue

Reputation: 11697

Fill column based on look-up table in R

Let's say I have the following "key" dataset:

key = read.table(text = "question r_answer d_answer
                  20 A B
                  21 B A 
                  22 A B
                  23 B A
                  24 A B
                  25 B A", header = T)

> key
  question r_answer d_answer
1       20        A        B
2       21        B        A
3       22        A        B
4       23        B        A
5       24        A        B
6       25        B        A

This tells me, for a given question, what answer an "R" would give, and what answer a "D" would give.

Now let's say this is the dataset:

data =  read.table(text = "person_id question answer
1 20 A
1 21 B
1 22 A
1 23 B
1 24 A
1 25 B
2 20 A
2 21 A
2 23 A
2 24 B
2 25 B", header = T)

>data
   person_id question answer
1          1       20      A
2          1       21      B
3          1       22      A
4          1       23      B
5          1       24      A
6          1       25      B
7          2       20      A
8          2       21      A
9          2       23      A
10         2       24      B
11         2       25      B

This tells me, for a given person, what their actual answers were. I want to create an answer_type column in data that is equal to either r_answer or d_answer depending on the values listed in key. The resulting output would be:

   person_id question answer answer_type
1          1       20      A    r_answer
2          1       21      B    r_answer
3          1       22      A    r_answer
4          1       23      B    r_answer
5          1       24      A    r_answer
6          1       25      B    r_answer
7          2       20      A    r_answer
8          2       21      A    d_answer
9          2       23      A    d_answer
10         2       24      B    d_answer
11         2       25      B    r_answer

I have a feeling that the answer will involve merging from dplyr, but I can't quite figure it out.

Upvotes: 5

Views: 902

Answers (4)

ThomasIsCoding
ThomasIsCoding

Reputation: 101956

Here is a base R solution

data$answer_type <- apply(data, 1, function(v) {
  colnames(key)[which(key[match(v["question"],key$question),] == v["answer"])]
})

such that

> data
   person_id question answer answer_type
1          1       20      A    r_answer
2          1       21      B    r_answer
3          1       22      A    r_answer
4          1       23      B    r_answer
5          1       24      A    r_answer
6          1       25      B    r_answer
7          2       20      A    r_answer
8          2       21      A    d_answer
9          2       23      A    d_answer
10         2       24      B    d_answer
11         2       25      B    r_answer

Upvotes: 1

jazzurro
jazzurro

Reputation: 23574

If you want the same approach with data.table. You can do the following. melt() is working like pivot_longer() in order to convert key to a long-format data. I created columns names in the way that I can join the two data sets easily. Then, you merge\join with data.

mykey <- melt(setDT(key), id.vars = "question", measure.vars = patterns("answer"),
              variable.name = "answer_type", value.name = "answer")

mykey[setDT(data), on = c("question", "answer")]

    question answer_type answer person_id
 1:       20    r_answer      A         1
 2:       21    r_answer      B         1
 3:       22    r_answer      A         1
 4:       23    r_answer      B         1
 5:       24    r_answer      A         1
 6:       25    r_answer      B         1
 7:       20    r_answer      A         2
 8:       21    d_answer      A         2
 9:       23    d_answer      A         2
10:       24    d_answer      B         2
11:       25    r_answer      B         2 

Upvotes: 2

akrun
akrun

Reputation: 887391

We can do

library(dplyr)
library(tidyr)
data %>%
    group_split(person_id) %>%
    map_dfr(~ .x %>% 
        left_join(pivot_longer(key,  cols = -question),
              by = c('question',  'answer' = 'value')))
# A tibble: 11 x 4
#   person_id question answer name    
#       <int>    <int> <fct>  <chr>   
# 1         1       20 A      r_answer
# 2         1       21 B      r_answer
# 3         1       22 A      r_answer
# 4         1       23 B      r_answer
# 5         1       24 A      r_answer
# 6         1       25 B      r_answer
# 7         2       20 A      r_answer
# 8         2       21 A      d_answer
# 9         2       23 A      d_answer
#10         2       24 B      d_answer
#11         2       25 B      r_answer

Upvotes: 4

tmfmnk
tmfmnk

Reputation: 39868

One dplyr and tidyr option could be:

data %>%
 left_join(key %>%
            pivot_longer(-question, names_to = "answer_type"), by = c("question" = "question",
                                                                      "answer" = "value"))

   person_id question answer answer_type
1          1       20      A    r_answer
2          1       21      B    r_answer
3          1       22      A    r_answer
4          1       23      B    r_answer
5          1       24      A    r_answer
6          1       25      B    r_answer
7          2       20      A    r_answer
8          2       21      A    d_answer
9          2       23      A    d_answer
10         2       24      B    d_answer
11         2       25      B    r_answer

Upvotes: 5

Related Questions