Reputation: 142
Not new to R, but I'm new to more advanced R techniques and I've run into an issue. I have a somewhat large dataset I'm working with (not honking big, but about 65000 rows of data total incorporating 18 trials). Link here: https://www.dropbox.com/s/qn6fldj9z6w21b2/wtvstyr%20%282%29.csv?dl=0, and I've been working with it as a dataframe. Here is the task at hand:
I need to conditionally replace velocity values based on information from the direction and Y columns on a trial by trial basis. Here are my conditions: if direction is TRUE and the first 5 values of Y are <20, I need to replace all velocity values for Trial x with NA. If direction is TRUE and the first 5 values of Y are not <20, then I only need to do it on a case-by-case basis. If direction is FALSE and the first 5 values of Y are >180, I need to replace all velocity values for Trial x with NA. If direction is FALSE and the first 5 values of Y are not >180, then I only need to do it on a case-by-case basis.
I have the following code using dplyr from a few solutions that I've found on here (mainly from dplyr replacing na values in a column based on multiple conditions):
wtvstyr <- wtvstyr %>%
mutate(velocity = case_when(direction == TRUE & Y<20 ~ NA_real_, TRUE ~ velocity))
wtvstyr <- wtvstyr %>%
mutate(velocity = case_when(direction == FALSE & Y>180 ~ NA_real_, TRUE ~ velocity))
Which solves my problem on the case-by-case basis. As for discarding entire trials, I am rather stumped. I tried to do it with ifelse wrapped in a dplyr pipeline with an index for the first value, but I must confess I have no idea what I'm doing. Here is that bit of code for the TRUE/<20 conditional along these lines: Using If/Else on a data frame:
wtvstyr %>%
group_by(Trial) %>%
ifelse(case_when(direction == TRUE & Y[1]<20), velocity, NA_real_)
When I tried that, however, I got an unused argument error for NA.
Any help would be appreciated! And if there's a better way to do this entirely (re, masking values or some other way I don't know), any guidance would be fantastic. Thanks!
EDIT
Here is a reproducible mini-example of my dataset:
require(tidyverse)
set.seed(80)
Trial <- c(rep(1, 40), rep(2, 40))
Y <- c(sample(0:200, 80, replace=TRUE))
Time <- c(1:80)
Direction1 <- c(rep("TRUE", 10), rep("FALSE", 10))
Direction <- c(rep(Direction1, 4))
example <- data.frame(Trial, Time, Y, Direction)
example$Y2 = example$Y
shift <- function(x, n){
c(x[-(seq(n))], rep(NA, n))
}
example$Y2 <- shift(example$Y2, 1)
example$velocity <- as.numeric(example$Y2) - as.numeric(example$Y)
example <- example[-c(5)]
#bit of code to remove velocities when they meet conditions I don't want:
example <- example %>%
mutate(velocity = case_when(Direction == TRUE & Y<20 ~ NA_real_, TRUE ~ velocity))
example <- example %>%
mutate(velocity = case_when(Direction == FALSE & Y>180 ~ NA_real_, TRUE ~ velocity))
With that second bit of code I can remove my case-by-case values (I hope this example clarifies what I mean). I'm still having trouble coding some kind of way to identify based on the first five values in Y which trials need to be discarded entirely.
So for example, in the first subsection of data where Trial==1 and Direction==TRUE, if any of the first five points of data within that subsection are <20, I need to discard all values in that section while Direction==TRUE. In my original dataset, Direction==TRUE and Direction==FALSE repeat a number of times. I need to treat each case separately.
In my set.seed that I have, the first five Y values under Trial==1 and Direction==TRUE are 138, 40, 32, 192 and 99. Here, because no values are <20 I want to keep that trial and simply remove any values thereafter that meet those conditions (as done by the code above). However, when Trial==1 and Direction==FALSE, my values are 34, 187, 53, 79 and 8. Because 187>180, I need to remove all the values corresponding to Trial==1 and Direction==FALSE. However, later on, there is another case where Trial=1 and Direction==FALSE. I want to keep that case separately and evaluate it based on the first five values. If I need to attach another column numbering what repetition of direction I'm on to keep them separated, I can do that.
Let me know if you need any more clarification and again, thank you for any help you can give.
Upvotes: 0
Views: 932
Reputation: 142
Using the code that GenesRus handed me, I was able to modify the code to select the trials that I want:
trialdata_filter <- trialdata %>%
mutate(direction= as.logical(direction)) %>%
mutate(is.special = case_when(direction == FALSE & Y > 180 ~ TRUE, direction == TRUE & Y <20 ~ TRUE, TRUE ~ FALSE)) %>%
group_by(bartrial) %>%
filter(!any(is.special[1:25] == TRUE))
Thanks for the help!
Upvotes: 1
Reputation: 1057
If I've gathered roughly what you're looking for, the easiest way do this is to create a special column to save those that you want to keep outside of your other conditions and manually set those in a case_when
. After that, you can group_by
Trial and Direction and set up a filter to just select just those Trial/Direction groups that qualify (where any value in the first five in that group are not smaller than 20 or less than 180, depending on Direction, or is otherwise a special case). From there, you can either slice to get the top 5, but in case you want the special rows, too, I've filtered.
example %>%
mutate(Direction= as.logical(Direction)) %>%
mutate(is.special = case_when(
Trial== 1 & Direction == FALSE & Y == 30 ~ TRUE,
TRUE ~ FALSE ## This is a weird convention, but TRUE just catches if nothing else evaluates TRUE and in this case, we want that to be
)) %>%
group_by(Trial, Direction) %>%
filter(
is.special |
(Direction == TRUE & !any(Y[1:5] < 20)) |
(Direction == FALSE & !any(Y[1:5] > 180))
) %>%
filter(
is.special | row_number() <= 5
)
any
is a nice function that will look at the members of the group to see if any meets the condition. Since I'm negating it, you might want to use all
but I wanted to use the signs you had above to keep things consistent.
Upvotes: 1