Reputation: 5
I have a dataframe where I would like to fill in values from column "reference column" into the column "trial_number" based on the following two conditions:
The values should only be filled into column "trial_number" if both conditions are met at the same time.
One complication is that the columns "start_time" and "end_time" contain more values than columns "a", "b" and "reference_column". The idea is that each of the values in "reference_column" is filled in multiple rows of column "trial_number".
I as well have the columns "a", "b" and "reference_column" in a separate dataframe.
A simplified version of the dataframe looks like this:
reference_column a b trial_number start_time end_time
1 1 10 20 NA 12 18
2 2 22 24 NA 22 23
3 3 30 40 NA 23 24
4 4 45 55 NA 31 32
5 NA NA NA NA 35 36
6 NA NA NA NA 36 37
7 NA NA NA NA 37 39
8 NA NA NA NA 46 51
9 NA NA NA NA 47 53
10 NA NA NA NA 50 54
In the end, the desired output (filled values in "trial_number" should look like this:
reference_column a b trial_number start_time end_time
1 1 10 20 01 12 18
2 2 22 24 02 22 23
3 3 30 40 02 23 24
4 4 45 55 03 31 32
5 NA NA NA 03 35 36
6 NA NA NA 03 36 37
7 NA NA NA 03 37 39
8 NA NA NA 04 46 51
9 NA NA NA 04 47 53
10 NA NA NA 04 50 54
Since I am quite new to R, I would be very happy about any recommendations. I already tried to use a for loop in combination with an if-statement, but wasn't successful because of the unbalanced number of data entries and various conditions. Thanks a lot in advance!
Upvotes: 0
Views: 74
Reputation: 1922
Perhaps this approach would be helpful?
Dataset:
library(tidyverse)
df <- data.frame(referenceColumn = seq(from =1, to = 10, by =1),
a = c(10, 22, 30, 45, NA, NA, NA, NA, NA, NA),
b = c(20, 24, 40, 55, NA, NA, NA, NA, NA, NA),
trialNumber = rep(NA, 10),
startTime = c(12, 22, 23, 31, 35, 36, 37, 46, 47, 50),
endTime = c(18, 23, 24, 32, 36, 37, 39, 51, 53, 54))
Separate your reference table and dataset
dimDf <- df %>%
select(referenceColumn:b)
factDf <- df %>%
select(trialNumber:endTime)
Use your reference table in a case_when()
dfNew <- factDf %>%
mutate(trialNumber = case_when(
startTime >= dimDf$a[1] & endTime <= dimDf$b[1] ~ dimDf$referenceColumn[1],
startTime >= dimDf$a[2] & endTime <= dimDf$b[2] ~ dimDf$referenceColumn[2],
startTime >= dimDf$a[3] & endTime <= dimDf$b[3] ~ dimDf$referenceColumn[3],
startTime >= dimDf$a[4] & endTime <= dimDf$b[4] ~ dimDf$referenceColumn[4],
TRUE ~ NA_real_
))
or keep all columns
dfNew <- df %>%
mutate(trialNumber = case_when(
startTime >= dimDf$a[1] & endTime <= dimDf$b[1] ~ dimDf$referenceColumn[1],
startTime >= dimDf$a[2] & endTime <= dimDf$b[2] ~ dimDf$referenceColumn[2],
startTime >= dimDf$a[3] & endTime <= dimDf$b[3] ~ dimDf$referenceColumn[3],
startTime >= dimDf$a[4] & endTime <= dimDf$b[4] ~ dimDf$referenceColumn[4],
TRUE ~ NA_real_
))
Upvotes: 1
Reputation: 1087
My answer is going to be incomplete because I don't fully understand what you are doing with this statement:
One complication is that the columns "start_time" and "end_time" contain more values than columns "a", "b" and "reference_column". The idea is that each of the values in "reference_column" is filled in multiple rows of column "trial_number".
But I think I have some directional help:
library(tidyverse)
df <- tribble(~reference_column, ~a, ~b, ~trial_number, ~start_time, ~end_time,
1, 10, 20, NA, 12, 18,
2, 22, 24, NA, 22, 23,
3, 30, 40, NA, 23, 24,
4, 45, 55, NA, 31, 32,
NA, NA, NA, NA, 35, 36,
NA, NA, NA, NA, 36, 37,
NA, NA, NA, NA, 37, 39,
NA, NA, NA, NA, 46, 51,
NA, NA, NA, NA, 47, 53,
NA, NA, NA, NA, 50, 54)
df %>%
mutate(trial_number = if_else(start_time >= a & end_time <= b,
reference_column,
trial_number))
# A tibble: 10 × 6
reference_column a b trial_number start_time end_time
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 10 20 1 12 18
2 2 22 24 2 22 23
3 3 30 40 NA 23 24
4 4 45 55 NA 31 32
5 NA NA NA NA 35 36
6 NA NA NA NA 36 37
7 NA NA NA NA 37 39
8 NA NA NA NA 46 51
9 NA NA NA NA 47 53
10 NA NA NA NA 50 54
In your example output, you find some way to put values for "trial_number" even where df$a
and df$b
are == NA
. Please either explain the additional logic in your question, or hopefully this is enough to give you some ideas how to proceed.
Upvotes: 0