Vanie Butil
Vanie Butil

Reputation: 11

How to do a double loop + else if?

Being a novice on R, I have trouble setting up the appropriate code (I would still say that it must include if/else commands and a loop).

In concrete terms, I would like to compare two pieces of information (see simplified example, because my actual database is rather long): "Monthly_category" and "Ref_category". The "Ref_category" to be taken into consideration is calculated only at the 5th period for each element (because then we move to the next element), thanks to the mode formula, for each element (Element_id).

Months  Element_Id  Monthly_Category  Ref_Category  Expected_output
 1	      1	          3	                NA	         0
 2	      1	          2	                NA	         0
 3 	      1	          2	                NA	         1
 4	      1	          1	                NA	         1
 5	      1	          3	                3	         0
 1	      2	          6	                2	         0
 2	      2	          6	                6	         1
 3	      2	          NA	                1	         0
 4	      2	          NA	                6	         0
 5	      2	          1	                1	         0
 

More precisely, I would like to put 1 as soon as the "Monthly_category" differs 2 periods in a row from the selected "Ref_category" which is calculated every 5 observations. Otherwise, set 0.

In addition, I would like the lines or Monthly_category = NA to give 0 directly because in the end, I will only take into account lines where I have 1s (and NA doesn't interest me).

For each element (1 element = 5 lines), the reference category is calculated at the end of the 5 periods using the mode. However, by stretching the formula, we have values in each line while I have to consider each time only the last value (so every 5 lines). That's why I thought we needed 2 loops: one to check each line for the monthly category and one to check the reference category every 5 lines.

Do you have any idea of the code that could allow me to do this?

A very big thank you if someone can enlighten me,

Vanie

Upvotes: 1

Views: 114

Answers (1)

Edo
Edo

Reputation: 7818

First of all, please have a look at the questions that @John Coleman and I asked you into the comments because my solution may change based on your request.

Anyway, you don't need an explicit for loop or an explicit if else to get the job done.

In R, you usually prefer not to write directly any for loop. You'd better use a functional like lapply. In this case the dplyr package takes care of any implicit looping.

df <-  tibble::tribble(~Months,  ~Element_Id,  ~Monthly_Category,  ~Ref_Category,  ~Expected_output,
                       1      ,            1,                  3,             NA,                 0,
                       2      ,            1,                  2,             NA,                 0,
                       3      ,            1,                  2,             NA,                 1,
                       4      ,            1,                  1,             NA,                 1,
                       5      ,            1,                  3,              3,                 0,

                       1      ,            2,                  6,              2,                 0,
                       2      ,            2,                  6,              6,                 1,
                       3      ,            2,                  1,              1,                 0,
                       4      ,            2,                  1,              6,                 0,
                       5      ,            2,                  1,              1,                 0)


library(dplyr)
library(purrr)

df %>%

  # check if elements are equal
  mutate(Real_Expected_output = !map2_lgl(Monthly_Category, Ref_Category, identical)) %>% 

  # sort by Element_Id and Months just in case your data is messy
  arrange(Element_Id, Months) %>% 

  # For each Element_Id ...
  group_by(Element_Id) %>% 

  #  ... define your Expected Output
  mutate(Real_Expected_output = as.integer(lag(Real_Expected_output, default = FALSE) & 
                                             lag(Real_Expected_output, 2, default = FALSE))) %>% 
  ungroup()


#   Months Element_Id Monthly_Category Ref_Category Expected_output Real_Expected_output
#   <dbl>      <dbl>            <dbl>        <dbl>           <dbl>                <int>
#       1          1                3           NA               0                    0
#       2          1                2           NA               0                    0
#       3          1                2           NA               1                    1
#       4          1                1           NA               1                    1
#       5          1                3            3               0                    1
#       1          2                6            2               0                    0
#       2          2                6            6               1                    0
#       3          2                1            1               0                    0
#       4          2                1            6               0                    0
#       5          2                1            1               0                    0

Real_Expected_output is not the same of your Expected_output just because I do believe your expected result contradicts your written requests as I said in one of the comments.

EDIT:

Based on your comment, I suppose this is what you're looking for. Again: no loops, you just need to use wisely the tools that the dplyr package is already providing, i.e. last, group_by, mutate

df %>%

  # sort by Element_Id and Months just in case your data is messy
  arrange(Element_Id, Months) %>% 

  # For each Element_Id ...
  group_by(Element_Id) %>% 

  # ... check if Monthly Category is equal to the last Ref_Category
  mutate(Real_Expected_output = !map2_lgl(Monthly_Category, last(Ref_Category), identical)) %>% 


  #  ... and define your Expected Output
  mutate(Real_Expected_output = as.integer(Real_Expected_output & 
                                             lag(Real_Expected_output, default = FALSE))) %>% 

  ungroup()

#   Months Element_Id Monthly_Category Ref_Category Expected_output Real_Expected_output
#   <dbl>      <dbl>            <dbl>        <dbl>           <dbl>                 <int>
#       1          1                3           NA               0                     0
#       2          1                2           NA               0                     0
#       3          1                2           NA               1                     1
#       4          1                1           NA               1                     1
#       5          1                3            3               0                     0
#       1          2                6            2               0                     0
#       2          2                6            6               1                     1
#       3          2                1            1               0                     0
#       4          2                1            6               0                     0
#       5          2                1            1               0                     0

EDIT 2:

I'll edit it again based on your request. At this point I'd suggest you to create an external function to handle your problem. It looks cleaner.


df <-  tibble::tribble(~Months,  ~Element_Id,  ~Monthly_Category,  ~Ref_Category,  ~Expected_output,
                       1      ,            1,                  3,             NA,                 0,
                       2      ,            1,                  2,             NA,                 0,
                       3      ,            1,                  2,             NA,                 1,
                       4      ,            1,                  1,             NA,                 1,
                       5      ,            1,                  3,              3,                 0,

                       1      ,            2,                  6,              2,                 0,
                       2      ,            2,                  6,              6,                 1,
                       3      ,            2,                 NA,              1,                 0,
                       4      ,            2,                 NA,              6,                 0,
                       5      ,            2,                  1,              1,                 0)


library(dplyr)
library(purrr)


get_output <- function(mon, ref){

  # set here your condition
  exp <- !is.na(mon) & !map2_lgl(mon, last(ref), identical)

  # check exp and lag(exp), then convert to integer
  as.integer(exp & lag(exp, default = FALSE))

}


df %>%

  # sort by Element_Id and Months just in case your data is messy
  arrange(Element_Id, Months) %>% 

  # For each Element_Id ...
  group_by(Element_Id) %>% 

  # ... launch your function
  mutate(Real_Expected_output = get_output(Monthly_Category, Ref_Category)) %>% 

  ungroup()



# # A tibble: 10 x 6
#     Months Element_Id Monthly_Category Ref_Category Expected_output Real_Expected_output
#     <dbl>      <dbl>            <dbl>        <dbl>           <dbl>                <int>
#  1      1          1                3           NA               0                    0
#  2      2          1                2           NA               0                    0
#  3      3          1                2           NA               1                    1
#  4      4          1                1           NA               1                    1
#  5      5          1                3            3               0                    0
#  6      1          2                6            2               0                    0
#  7      2          2                6            6               1                    1
#  8      3          2               NA            1               0                    0
#  9      4          2               NA            6               0                    0
# 10      5          2                1            1               0                    0


Upvotes: 2

Related Questions