Reputation: 185
I have a data frame
stim1 stim2 Chosen Rejected
1: 2 1 2 1
2: 3 2 2 3
3: 3 1 1 3
4: 2 3 3 2
5: 1 3 1 3
My objective is at each trial to add a column that specifies whether the stimulus was most recently (in previous trials) Chosen or Rejected.
desired outcome
stim1 stim2 Chosen Rejected Previous_stim1 Previous_stim2
1: 2 1 2 1 NaN NaN
2: 3 2 2 3 NaN Chosen
3: 3 1 1 3 Rejected Rejected
4: 2 3 3 2 Chosen Rejected
5: 1 3 1 3 Chosen Chosen
any help will be greatly appreciated!
TarJae had a really helpful suggestion that helped categorize the piece of the dataframe i shared correctly. I didn't mention that it's really part of a larger data frame and for some reason fairly quickly this method stops classifying correctly
stim1 stim2 Chosen Rejected Previous_stim1 Previous_stim2
1: 2 1 2 1 <NA> <NA>
2: 3 2 2 3 <NA> Chosen
3: 3 1 1 3 Rejected Rejected
4: 2 3 3 2 Chosen Rejected
5: 1 3 1 3 Chosen Chosen
6: 2 1 1 2 Chosen Chosen
7: 2 3 2 3 Chosen Chosen
8: 3 1 1 3 Chosen Chosen
9: 2 1 2 1 Chosen Chosen
For example, in row 6 stim1==2. Most recently, 2 was rejected (row 4) but the method classified it as chosen.
Any ideas what this happens?
Thank you again for everyones help.
Update 2
Thank you so much for your help. But say I have also a column with the "outcome".
stim1 stim2 Chosen Rejected outcome Previous_stim 1 Previous_stim 2
1: 15 13 15 13 1 <NA> <NA>
2: 13 14 14 13 1 Rejected <NA>
3: 14 15 14 15 1 Chosen Chosen
4: 14 13 14 13 0 Chosen Rejected
5: 13 15 13 15 0 Rejected Rejected
6: 14 15 14 15 1 Chosen Rejected
7: 15 13 15 13 1 Rejected Chosen
8: 14 15 14 15 0 Chosen Chosen
I want to encode whether it was
1) most recently chosen and outcome=1 (can be coded as 1)
2) most recently chosen and outcome=0 (can be coded as 2)
3) most recently rejected and outcome=1 (can be coded as 3)
4) most recently rejected and outcome=0 (can be coded as 4)
is there an easy way to modify the code to make that happen?
Desired output
stim1 stim2 Chosen Rejected outcome Previous_stim 1 Previous_stim 2 Left_type right_type
1 2 3 2 3 1 <NA> <NA> NaN NaN
2 1 3 3 1 1 <NA> Rejected NaN 3
3 2 1 1 2 1 Chosen Rejected 1 3
4 1 2 1 2 0 Chosen Rejected 1 3
5 3 1 3 1 1 Chosen Chosen 1 2
LAST FOLLOW UP
Finally, I would like to add a column checking whether the chosen stimulus in that previous trial (which I am referencing as the most recent rejected trial for the stim in question) is the same as my current alternative stimulus
For example if have
stim1 stim2 Chosen Rejected Previous_stim1 Previous_stim2
1: 2 1 2 1 NaN NaN
2: 3 2 2 3 NaN Chosen
3: 3 1 1 3 Rejected Rejected
4: 2 3 3 2 Chosen Rejected
5: 1 3 1 3 Chosen Chosen
And here is how I would update my table
in trial 3, previous_stim1 (i.e 3) was previously rejectedin favor of 2 (from trial 2) and not in favor of 1 (which is the current alternative) and so Current_alternative_left=0.
Similarly, previous_stim2 (i.e 1)was previously
rejected but that was rejected in favor of 2 (from trial 1)
and so current_alternative_right=0
On the other hand, in trial 4 stim1=2
was previously chosen relative to the same
stimulus as its currently being pitted against (3) and so current_alternative_right=1
Desired Output
stim1 stim2 Chosen Rejected outcome Previous_stim 1 Previous_stim 2 Left_type right_type
1 2 3 2 3 1 <NA> <NA> NaN NaN
2 1 3 3 1 1 <NA> Rejected NaN 3
3 2 1 1 2 1 Chosen Rejected 1 3
4 1 2 1 2 0 Chosen Rejected 1 3
5 3 1 3 1 1 Chosen Chosen 1 2
Current_alternative_left Current_alternative_right
NaN NaN
NaN 0
0 0
1 0
1 0
i am new to data.table but i tried to copy ThomasisCoding function to return this as well with
h <- function(stim, cr) {
stim_chosen <- rep(NA,length(stim))
for (k in seq_along(stim)[-1]) {
ind <- which(cr[1:(k - 1), , drop = FALSE] == stim[k], arr.ind = TRUE)
if (length(ind)) {
stim_chosen[k] <- stim[tail(ind,1)[,"row"]]
}
}
stim_chosen
}
setDT(df)[ ,
paste0("Chosen_Last", 1:2) := lapply(
.(stim1, stim2),
h,
cr = cbind(Chosen,Rejected)
)
]
though this is not quite giving me the correct answer. Anyone know where i am going wrong?
Upvotes: 4
Views: 144
Reputation: 101099
For Update 2
setDT(df)[
,
paste0("Previous_stim", 1:2) := lapply(
.(stim1, stim2),
f,
cr = cbind(Chosen, Rejected)
)
][
,
paste0(c("left", "right"), "type") := lapply(.SD, function(x) 2 * (x == "Rejected") + 2 - outcome),
.SDcols = patterns("Previous")
][]
gives
stim1 stim2 Chosen Rejected outcome Previous_stim1 Previous_stim2 lefttype
1: 2 1 2 1 1 <NA> <NA> NA
2: 3 2 2 3 1 <NA> Chosen NA
3: 3 1 1 3 1 Rejected Rejected 3
4: 2 3 3 2 0 Chosen Rejected 2
5: 1 3 1 3 0 Chosen Chosen 2
6: 2 1 1 2 1 Rejected Chosen 3
7: 2 3 2 3 1 Rejected Rejected 3
8: 3 1 1 3 0 Rejected Chosen 4
9: 2 1 2 1 0 Chosen Chosen 2
righttype
1: NA
2: 1
3: 3
4: 4
5: 2
6: 1
7: 3
8: 2
9: 2
Data
> dput(df)
structure(list(stim1 = c(2L, 3L, 3L, 2L, 1L, 2L, 2L, 3L, 2L),
stim2 = c(1L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 1L), Chosen = c(2L,
2L, 1L, 3L, 1L, 1L, 2L, 1L, 2L), Rejected = c(1L, 3L, 3L,
2L, 3L, 2L, 3L, 3L, 1L), outcome = c(1, 1, 1, 0, 0, 1, 1,
0, 0)), class = "data.frame", row.names = c(NA, -9L))
As per your update, you can try the following code by defining a custom function f
f <- function(stim, cr) {
res <- rep(NA, length(stim))
for (k in seq_along(stim)[-1]) {
ind <- which(cr[1:(k - 1), , drop = FALSE] == stim[k], arr.ind = TRUE)
if (length(ind)) {
res[k] <- colnames(cr)[tail(ind[, "col"][order(ind[, "row"])], 1)]
}
}
res
}
setDT(df)[
,
paste("Previous_stim", 1:2) := lapply(
.(stim1, stim2),
f,
cr = cbind(Chosen, Rejected)
)
][]
and you will see
> setDT(df)[, paste("Previous_stim",1:2) := lapply(.(stim1,stim2),f, cr = cbind(Chosen, Rejected))][]
stim1 stim2 Chosen Rejected Previous_stim 1 Previous_stim 2
1: 2 1 2 1 <NA> <NA>
2: 3 2 2 3 <NA> Chosen
3: 3 1 1 3 Rejected Rejected
4: 2 3 3 2 Chosen Rejected
5: 1 3 1 3 Chosen Chosen
6: 2 1 1 2 Rejected Chosen
7: 2 3 2 3 Rejected Rejected
8: 3 1 1 3 Rejected Chosen
9: 2 1 2 1 Chosen Chosen
Upvotes: 2
Reputation: 78917
Here is a solution that was generated with the help of ThomasIsCoding Check if value of column A is present in the same row or previous rows of column B:Here are also additional answers which are adequate for your solution! You could change and adapt which one fits for you. I chose the first one provided by ThomasIsCoding.
The main task was to check the value in all previous rows of an other column
library(dplyr)
df %>%
mutate(x = replace(rep(NA, length(Chosen)), match(stim1, lag(Chosen)) <= seq_along(stim1), "Chosen"),
y = replace(rep(NA, length(Rejected)), match(stim1, lag(Rejected)) <= seq_along(stim1), "Rejected"),
a = replace(rep(NA, length(Chosen)), match(stim2, lag(Chosen)) <= seq_along(stim2), "Chosen"),
b = replace(rep(NA, length(Rejected)), match(stim2, lag(Rejected)) <= seq_along(stim2), "Rejected"),
Previous_stim1 = coalesce(x, y),
Previous_stim2 = coalesce(a, b)) %>%
select(stim1, stim2, Chosen, Rejected, Previous_stim1, Previous_stim2)
stim1 stim2 Chosen Rejected Previous_stim1 Previous_stim2
1: 2 1 2 1 <NA> <NA>
2: 3 2 2 3 <NA> Chosen
3: 3 1 1 3 Rejected Rejected
4: 2 3 3 2 Chosen Rejected
5: 1 3 1 3 Chosen Chosen
Upvotes: 2
Reputation: 224
I think you need to correct the desired outcome in the above table. But it looks like you are looking for the lag
verb which can helpfully solve this when used alongside if_else
:
library(dplyr)
tbl <- tibble(stim1 = c(2,3,3,2,1), stim2 = c(1,2,1,3,3),
chosen = c(2,2,1,3,1), rejected = c(1,3,3,2,3))
tbl %>%
mutate(Previous_stim1 = if_else(lag(tbl$chosen) == lag(stim1), "Chosen", "Rejected")) %>%
mutate(Previous_stim2 = if_else(lag(tbl$chosen) == lag(stim2), "Chosen", "Rejected"))
Upvotes: 1