Reputation: 107
Apologies in advance that this question is massive- I'm trying to include as much detail as possible. The example table below is output from a behavioral task. In each of the 448 trials, there are visual masks at the beginning and end of each trial, one visual prime (six possible primes, indicated by codes 1:6), and one target to which a response is made (six possible, indicated by codes 7:12). Accuracy to the target is noted by either 1 (correct) or 0 (incorrect).
I now need to add new codes to the Response rows. These news codes will have to be new numbers that represent each combination of target type and accuracy. For example, if there was a square target (code 12), and the participant was correct (Accuracy=1), the new code would be 25; if there was a square target (12) and they were incorrect (Accuracy=0), the new code would be 26, and so on for each combination of target (7:12) and accuracy (0:1) (12 combos total).
Trial | Label | Code | Accuracy |
---|---|---|---|
1 | Pre-Mask | 13 | 1 |
1 | BluePrime | 6 | 1 |
1 | SqureTarget | 12 | 1 |
1 | Post-Mask | 14 | 1 |
1 | Response | NA | 1 |
2 | Pre-Mask | 13 | 0 |
2 | RedPrime | 4 | 0 |
2 | CircleTarget | 9 | 0 |
2 | Post-Mask | 14 | 0 |
2 | Response | NA | 0 |
Basically, I need a conditional statement that replaces the NA's based on information from two different columns from two rows prior to the missing values. The output should look like this:
Trial | Label | Code | Accuracy |
---|---|---|---|
1 | Pre-Mask | 13 | 1 |
1 | BluePrime | 6 | 1 |
1 | SqureTarget | 12 | 1 |
1 | Post-Mask | 14 | 1 |
1 | Response | 25 | 1 |
2 | Pre-Mask | 13 | 0 |
2 | RedPrime | 4 | 0 |
2 | CircleTarget | 9 | 0 |
2 | Post-Mask | 14 | 0 |
2 | Response | 20 | 0 |
This is the table with all the new possible combinations:
When target is.. | And Accuracy is... | New Code # |
---|---|---|
7 | 1 | 15 |
7 | 0 | 16 |
8 | 1 | 17 |
8 | 0 | 18 |
9 | 1 | 19 |
9 | 0 | 20 |
10 | 1 | 21 |
10 | 0 | 22 |
11 | 1 | 23 |
11 | 0 | 24 |
12 | 1 | 25 |
12 | 0 | 26 |
I have tried different combinations of lag() and case_when(), but as I start typing, I realize I don't really know how to structure the argument (I'm still very new to R and programming in general). I know I could manually input the numbers in Excel, but I'm hoping to automate this to some extent in R. I realize this information could also just be put into a new row pretty easily, but in order for the next program I am using to understand the code values, they all will need to be in the same column in the order in which they occur in real time (the response input will always follow the post-visual mask, code 14).
If any clarification is needed please let me know. Any help is appreciated!
Upvotes: 0
Views: 413
Reputation: 131
You can do it this way:
merge <- merge(x = acc, y = target, by = c("Code", "Accuracy"), all.x= TRUE)
Code Accuracy Trial Label newCode
1 4 0 2 RedPrime NA
2 6 1 1 BluePrime NA
3 9 0 2 CircleTarget 20
4 12 1 1 SqureTarget 25
5 13 0 2 Pre-Mask NA
6 13 1 1 Pre-Mask NA
7 14 0 2 Post-Mask NA
8 14 1 1 Post-Mask NA
9 NA 0 2 Response NA
10 NA 1 1 Response NA
newCodes <- merge[!is.na(merge$newCode),c("Trial", "newCode")]
merge <- merge(merge, newCodes, "Trial")
Trial Code Accuracy Label newCode.x newCode.y
1 1 6 1 BluePrime NA 25
2 1 12 1 SqureTarget 25 25
3 1 13 1 Pre-Mask NA 25
4 1 14 1 Post-Mask NA 25
5 1 NA 1 Response NA 25
6 2 4 0 RedPrime NA 20
7 2 9 0 CircleTarget 20 20
8 2 13 0 Pre-Mask NA 20
9 2 14 0 Post-Mask NA 20
10 2 NA 0 Response NA 20
merge[merge$Label =='Response',]$Code <- merge[merge$Label =='Response',]$newCode.y
Trial Code Accuracy Label newCode.x newCode.y
1 1 6 1 BluePrime NA 25
2 1 12 1 SqureTarget 25 25
3 1 13 1 Pre-Mask NA 25
4 1 14 1 Post-Mask NA 25
5 1 25 1 Response NA 25
6 2 4 0 RedPrime NA 20
7 2 9 0 CircleTarget 20 20
8 2 13 0 Pre-Mask NA 20
9 2 14 0 Post-Mask NA 20
10 2 20 0 Response NA 20
Sorry for not using case when, I hope you can use it.
Upvotes: 0
Reputation: 146070
I think this follows your logic:
library(dplyr)
library(tidyr)
input %>%
## join the new values to the original data on their current rows
left_join(lookup, by = c("Code", "Accuracy")) %>%
## do all subsequent operations within each Trial group
group_by(Trial) %>%
## Fill in missing `New_Code` values based on the previous non-missing value
fill(New_Code, .direction = "down") %>%
## replace any `Code` missing values with the `New_Code` value
mutate(Code = coalesce(Code, New_Code))
# # A tibble: 10 x 5
# # Groups: Trial [2]
# Trial Label Code Accuracy New_Code
# <int> <chr> <int> <int> <int>
# 1 1 Pre-Mask 13 1 NA
# 2 1 BluePrime 6 1 NA
# 3 1 SqureTarget 12 1 25
# 4 1 Post-Mask 14 1 25
# 5 1 Response 25 1 25
# 6 2 Pre-Mask 13 0 NA
# 7 2 RedPrime 4 0 NA
# 8 2 CircleTarget 9 0 20
# 9 2 Post-Mask 14 0 20
# 10 2 Response 20 0 20
You can, of course, drop the New_Code
column by adding ... %>% select(-New_Code)
-- I leave it in above for clarity.
Using this data:
input = read.table(text = 'Trial Label Code Accuracy
1 Pre-Mask 13 1
1 BluePrime 6 1
1 SqureTarget 12 1
1 Post-Mask 14 1
1 Response NA 1
2 Pre-Mask 13 0
2 RedPrime 4 0
2 CircleTarget 9 0
2 Post-Mask 14 0
2 Response NA 0', header = T)
lookup = read.table(text = 'Code Accuracy New_Code
7 1 15
7 0 16
8 1 17
8 0 18
9 1 19
9 0 20
10 1 21
10 0 22
11 1 23
11 0 24
12 1 25
12 0 26', header = TRUE)
Upvotes: 1