Paul
Paul

Reputation: 107

Using case_when and lag() together?

Apologies in advance that this question is massive- I'm trying to include as much detail as possible. The example table below is output from a behavioral task. In each of the 448 trials, there are visual masks at the beginning and end of each trial, one visual prime (six possible primes, indicated by codes 1:6), and one target to which a response is made (six possible, indicated by codes 7:12). Accuracy to the target is noted by either 1 (correct) or 0 (incorrect).

I now need to add new codes to the Response rows. These news codes will have to be new numbers that represent each combination of target type and accuracy. For example, if there was a square target (code 12), and the participant was correct (Accuracy=1), the new code would be 25; if there was a square target (12) and they were incorrect (Accuracy=0), the new code would be 26, and so on for each combination of target (7:12) and accuracy (0:1) (12 combos total).

Trial Label Code Accuracy
1 Pre-Mask 13 1
1 BluePrime 6 1
1 SqureTarget 12 1
1 Post-Mask 14 1
1 Response NA 1
2 Pre-Mask 13 0
2 RedPrime 4 0
2 CircleTarget 9 0
2 Post-Mask 14 0
2 Response NA 0

Basically, I need a conditional statement that replaces the NA's based on information from two different columns from two rows prior to the missing values. The output should look like this:

Trial Label Code Accuracy
1 Pre-Mask 13 1
1 BluePrime 6 1
1 SqureTarget 12 1
1 Post-Mask 14 1
1 Response 25 1
2 Pre-Mask 13 0
2 RedPrime 4 0
2 CircleTarget 9 0
2 Post-Mask 14 0
2 Response 20 0

This is the table with all the new possible combinations:

When target is.. And Accuracy is... New Code #
7 1 15
7 0 16
8 1 17
8 0 18
9 1 19
9 0 20
10 1 21
10 0 22
11 1 23
11 0 24
12 1 25
12 0 26

I have tried different combinations of lag() and case_when(), but as I start typing, I realize I don't really know how to structure the argument (I'm still very new to R and programming in general). I know I could manually input the numbers in Excel, but I'm hoping to automate this to some extent in R. I realize this information could also just be put into a new row pretty easily, but in order for the next program I am using to understand the code values, they all will need to be in the same column in the order in which they occur in real time (the response input will always follow the post-visual mask, code 14).

If any clarification is needed please let me know. Any help is appreciated!

Upvotes: 0

Views: 413

Answers (2)

Hubert
Hubert

Reputation: 131

You can do it this way:

  1. I've read the first table as "acc"
  2. I've read the last table as "target"
merge <- merge(x = acc, y = target, by = c("Code", "Accuracy"), all.x= TRUE)
   Code Accuracy Trial        Label newCode
1     4        0     2     RedPrime      NA
2     6        1     1    BluePrime      NA
3     9        0     2 CircleTarget      20
4    12        1     1  SqureTarget      25
5    13        0     2     Pre-Mask      NA
6    13        1     1     Pre-Mask      NA
7    14        0     2    Post-Mask      NA
8    14        1     1    Post-Mask      NA
9    NA        0     2     Response      NA
10   NA        1     1     Response      NA

newCodes <- merge[!is.na(merge$newCode),c("Trial", "newCode")]
merge <- merge(merge, newCodes, "Trial")  

  Trial Code Accuracy        Label newCode.x newCode.y
1      1    6        1    BluePrime        NA        25
2      1   12        1  SqureTarget        25        25
3      1   13        1     Pre-Mask        NA        25
4      1   14        1    Post-Mask        NA        25
5      1   NA        1     Response        NA        25
6      2    4        0     RedPrime        NA        20
7      2    9        0 CircleTarget        20        20
8      2   13        0     Pre-Mask        NA        20
9      2   14        0    Post-Mask        NA        20
10     2   NA        0     Response        NA        20

merge[merge$Label =='Response',]$Code <- merge[merge$Label =='Response',]$newCode.y

   Trial Code Accuracy        Label newCode.x newCode.y
1      1    6        1    BluePrime        NA        25
2      1   12        1  SqureTarget        25        25
3      1   13        1     Pre-Mask        NA        25
4      1   14        1    Post-Mask        NA        25
5      1   25        1     Response        NA        25
6      2    4        0     RedPrime        NA        20
7      2    9        0 CircleTarget        20        20
8      2   13        0     Pre-Mask        NA        20
9      2   14        0    Post-Mask        NA        20
10     2   20        0     Response        NA        20

Sorry for not using case when, I hope you can use it.

Upvotes: 0

Gregor Thomas
Gregor Thomas

Reputation: 146070

I think this follows your logic:

library(dplyr)
library(tidyr)

input %>%
  ## join the new values to the original data on their current rows
  left_join(lookup, by = c("Code", "Accuracy")) %>%
  ## do all subsequent operations within each Trial group
  group_by(Trial) %>%
  ## Fill in missing `New_Code` values based on the previous non-missing value
  fill(New_Code, .direction = "down") %>%
  ## replace any `Code` missing values with the `New_Code` value
  mutate(Code = coalesce(Code, New_Code))

# # A tibble: 10 x 5
# # Groups:   Trial [2]
# Trial Label         Code Accuracy New_Code
# <int> <chr>        <int>    <int>    <int>
#  1     1 Pre-Mask        13        1       NA
#  2     1 BluePrime        6        1       NA
#  3     1 SqureTarget     12        1       25
#  4     1 Post-Mask       14        1       25
#  5     1 Response        25        1       25
#  6     2 Pre-Mask        13        0       NA
#  7     2 RedPrime         4        0       NA
#  8     2 CircleTarget     9        0       20
#  9     2 Post-Mask       14        0       20
# 10     2 Response        20        0       20

You can, of course, drop the New_Code column by adding ... %>% select(-New_Code) -- I leave it in above for clarity.


Using this data:

input = read.table(text = 'Trial    Label   Code    Accuracy
1   Pre-Mask    13  1
1   BluePrime   6   1
1   SqureTarget 12  1
1   Post-Mask   14  1
1   Response    NA  1
2   Pre-Mask    13  0
2   RedPrime    4   0
2   CircleTarget    9   0
2   Post-Mask   14  0
2   Response    NA  0', header = T)

lookup = read.table(text = 'Code    Accuracy    New_Code
7   1   15
7   0   16
8   1   17
8   0   18
9   1   19
9   0   20
10  1   21
10  0   22
11  1   23
11  0   24
12  1   25
12  0   26', header = TRUE)

Upvotes: 1

Related Questions