Noel
Noel

Reputation: 109

How to create a new column based on other columns with if conditions in r

Not able to find a way to generate a new column based with if conditions for group of events in a column.

The column called "BF" represent the (i-3) of the flow column, and is going to be the same BF for each "event" group. For example, in row 5, the value of "BF" is 39, which is the previous 3rd value of the flow column (flow for row 2) for all the "2" in the event column. The problem is that BF[i] can't be bigger than flow[i]. If BF[i] is bigger than flow[i], then the BF should be the (i-4) or (i-5) or (1-6)... of the flow until BF[i] will be equal or smaller than flow[i]. For example, in row 10 the value of the column "BF" is bigger than the value of the column "flow", therefore, the value of BF_1 (column I want to create) in row 10 is 37, which represent the closest lower value of flow, in this case the flow[i-6].

As an example, we have the following dataframe:

flow<- c(40, 39, 38, 37, 50, 49, 46, 44, 43, 45, 40, 30, 80, 75, 50, 55, 53, 51, 49, 100)
event<- c(1,1,1,1,2,2,2,2,2,3,3,3,4,4,4,5,5,5,5,6)
BF<- c(NA, NA, NA, NA, 39, 39, 39, 39, 39, 46, 46, 46, 45, 45, 45, 80, 80, 80, 80, 53)
a<- data.frame(flow, event, BF)

This is the desire output I'm looking for. I want to create the BF_1 column.

   flow event BF  BF_1
1    40   1   NA   NA
2    39   1   NA   NA
3    38   1   NA   NA
4    37   1   NA   NA
5    50   2   39   39
6    49   2   39   39
7    46   2   39   39
8    44   2   39   39
9    43   2   39   39
10   45   3   46   37
11   40   3   46   37
12   30   3   46   37
13   80   4   45   45
14   75   4   45   45
15   50   4   45   45
16   55   5   80   30
17   53   5   80   30
18   51   5   80   30
19   49   5   80   30
20  100   6   53   53

Is there a possible way to generate the column BF_1? please let me know any thoughts. I am working with for loops and using if conditions but I am not able to hold the BF value for the entire group of the event column.

Upvotes: 2

Views: 1401

Answers (2)

tmfmnk
tmfmnk

Reputation: 39858

One tidyverse possibility could be:

a %>%
 left_join(crossing(a, a) %>%
            filter(event > event1) %>%
            group_by(event) %>%
            filter(flow == first(flow)) %>%
            slice(1:(n() - 3)) %>%
            slice(which.max(cumsum(flow > flow1))) %>%
            ungroup() %>%
            transmute(event,
                      flow_flag = flow1), by = c("event" = "event")) %>%
 mutate(BF_1 = ifelse(lag(flow, 3) > flow, flow_flag, lag(flow, 3))) %>%
 group_by(event) %>%
 mutate(BF_1 = first(BF_1)) %>%
 select(-flow_flag)

    flow event    BF  BF_1
   <dbl> <dbl> <dbl> <dbl>
 1    40     1    NA    NA
 2    39     1    NA    NA
 3    38     1    NA    NA
 4    37     1    NA    NA
 5    50     2    39    39
 6    49     2    39    39
 7    46     2    39    39
 8    44     2    39    39
 9    43     2    39    39
10    45     3    46    37
11    40     3    46    37
12    30     3    46    37
13    80     4    45    45
14    75     4    45    45
15    50     4    45    45
16    55     5    80    30
17    53     5    80    30
18    51     5    80    30
19    49     5    80    30
20   100     6    53    53

It could be overcomplicated, but what it does is, first, creating all combinations of values (as the desired value can be theoretically anywhere in the data). Second, it identifies the first case per group fulfilling the condition (not taking into account the previous 3rd value). Finally, it combines it with the original df and if the 3rd previous value per group is fulfilling the condition, then returns it, otherwise returns the value first fulfilling condition to be smaller than the actual value.

Upvotes: 1

beavis11111
beavis11111

Reputation: 574

coding a bit inefficient, could have use dplyr etc.., but it will do the work and matching the BF_1 column given

flow <- c(40, 39, 38, 37, 50, 49, 46, 44, 43, 45, 40, 30, 80, 75, 50, 55, 53, 51, 49, 100)
event <- c(1,1,1,1,2,2,2,2,2,3,3,3,4,4,4,5,5,5,5,6)
BF <- c(NA, NA, NA, NA, 39, 39, 39, 39, 39, 46, 46, 46, 45, 45, 45, 80, 80, 80, 80, 53)
a <- data.frame(flow, event, BF)

a$BF_1 <- NA #default to NA first

for(i in 1:length(unique(a$event))){

  if(is.na(a[a$event == i, "BF"][1])) next

  if(a[a$event == i, "BF"][1] < a[a$event == i, "flow"][1]) a[a$event == i, "BF_1"] <- a[a$event == i, "BF"][1]

  if(a[a$event == i, "BF"][1] > a[a$event == i, "flow"][1]) {
    head <- min(which(a$event==i))-6 
    if (min(head-6) < 0) head <- 1 #making sure it doesn't overflow to row 0
    a[a$event == i, "BF_1"] <- min( a[  head:min(which(a$event==i)), "flow"] ) #fill the min of the subset flow column given position
  }

}

a

Upvotes: 1

Related Questions