i have a df which a part of is similar to the following | Number|Category| A1|A2|B1|B2|C1|C2|A |B |C | | ------| -------|---|--|--|--|--|--|--|--|--| | 1 | 1 | 10|30|5 |15|NA|NA|5 |10|NA| | 2 | 2 | 10|30|5 |15|25|35|40|20|45| The conditions are A1 & A2, B1 & B2, C1 & C2 are the lower and upper limits respectively, of the factors A, B, C and columns A, B, C represent the measurements. If the measurement is under the lower limit the factor is "passed" , if it is in between the two limits, then the factor is in "danger" , if the measurement is higher than the higher limit then it is "failed" . For the category=1 we are permitted to have only 1 failure in one of the factors and in that case we classify the asset as "risk" , but if we have 2 failures then the asset in the row 1 "fail" . For Category=2 permitted 2 failures. If one factor fails is at "at risk" , if we have 2 failures is "risk" and we have 3 failures then its "fail" . So I would like to calculate for every row(asset) the status of every factor and then the status of the asset. I am trying to that with a for loop and an if-else statement that iterates through all these columns of every row but seems difficult as I am a beginner. The final result is to attach the following columns to the dataset. Thank you in advance |Number|Aa |Bb |Cc |Result | |------|------|------|------|-------| |1 |passed|danger|NA | risk | |2 |failed|failed|failed| failed|

Reputation: 87

For loop with if else

i have a df which a part of is similar to the following

| Number|Category| A1|A2|B1|B2|C1|C2|A |B |C |
| ------| -------|---|--|--|--|--|--|--|--|--|
| 1     |  1     | 10|30|5 |15|NA|NA|5 |10|NA|
| 2     |  2     | 10|30|5 |15|25|35|40|20|45|

The conditions are

A1 & A2, B1 & B2, C1 & C2 are the lower and upper limits respectively, of the factors A, B, C
and columns A, B, C represent the measurements.
If the measurement is under the lower limit the factor is "passed",
if it is in between the two limits, then the factor is in "danger",
if the measurement is higher than the higher limit then it is "failed".
For the category=1 we are permitted to have only 1 failure in one of the factors and in that case we classify the asset as "risk",
but if we have 2 failures then the asset in the row 1 "fail".
For Category=2 permitted 2 failures. If one factor fails is at "at risk", if we have 2 failures is "risk" and we have 3 failures then its "fail".

So I would like to calculate for every row(asset) the status of every factor and then the status of the asset. I am trying to that with a for loop and an if-else statement that iterates through all these columns of every row but seems difficult as I am a beginner. The final result is to attach the following columns to the dataset. Thank you in advance

|Number|Aa    |Bb    |Cc    |Result |
|------|------|------|------|-------|
|1     |passed|danger|NA    | risk  |
|2     |failed|failed|failed| failed|

Upvotes: 2

Answers (3)

Anoushiravan R

Reputation: 21938

You can also use the following solution which is a combination of base R and tidyverse:

library(dplyr)
library(purrr)

colnames <- c(1, 2) 
tmp <- df[-colnames]


lapply(split.default(tmp, gsub("(\\w)\\d+?", "\\1", names(tmp))), 
       function(x) cbind(df[colnames], x)) %>% 
  imap(~ .x %>% 
        mutate(!!{.y} := pmap_chr(., ~ 
                            ifelse(any(is.na(..3), is.na(..4), is.na(..5)), "NA", 
                                   ifelse(..5 > ..3 & ..5 < ..4, "danger", ifelse(..5 < ..3, "passed", "failed"))))) %>%
         select(-c(3, 4))) %>%
  reduce(~ full_join(..1, ..2, id = c("Number", "Category"))) %>%
  rowwise() %>%
  mutate(Result = case_when(
    Category == 1 & sum(c_across(A:C) == "failed") <= 1 ~ "Risk",
    Category == 1 & sum(c_across(A:C) == "failed") > 1 ~ "Fail",
    Category == 2 & sum(c_across(A:C) == "failed") == 1 ~ "At_Risk",
    Category == 2 & sum(c_across(A:C) == "failed") == 2 ~ "Risk",
    Category == 2 & sum(c_across(A:C) == "failed") == 3 ~ "Fail" 
  ))

# A tibble: 2 x 6
# Rowwise: 
  Number Category A      B      C      Result
   <dbl>    <dbl> <chr>  <chr>  <chr>  <chr> 
1      1        1 passed danger NA     Risk  
2      2        2 failed failed failed Fail

Upvotes: 1

AnilGoyal

Reputation: 26238

This can be done in dplyr only without even reshaping the data or using any loop (for/while). Using across, cur_data() and cur_column() which are certainly powerful functions from dplyr.

library(dplyr, warn.conflicts = F)

df
#>   Number Category A1 A2 B1 B2 C1 C2  A  B  C
#> 1      1        1 10 30  5 15 NA NA  5 10 NA
#> 2      2        2 10 30  5 15 25 35 40 20 45

df %>% group_by(Number, Category) %>%
  transmute(across(c('A', 'B', 'C'), ~ case_when(is.na(.) | is.na(get(paste0(cur_column(), 1))) | 
                                                is.na(get(paste0(cur_column(), 2))) ~ NA_character_,
                                              . < get(paste0(cur_column(), 1)) ~ 'passed',
                                              . <= get(paste0(cur_column(), 2)) ~ 'danger',
                                              TRUE ~ 'failed'),
                   .names = '{.col}{tolower(.col)}')) %>%
  mutate(Result = ifelse(rowSums(cur_data() == 'failed', na.rm = T) <= Category, 'risk', 'failed'))

#> # A tibble: 2 x 6
#> # Groups:   Number, Category [2]
#>   Number Category Aa     Bb     Cc     Result
#>    <int>    <int> <chr>  <chr>  <chr>  <chr> 
#> 1      1        1 passed danger <NA>   risk  
#> 2      2        2 failed failed failed failed

^{Created on 2021-07-06 by the reprex package (v2.0.0)}

Upvotes: 1

Limey

Reputation: 12585

Much of your problem is caused by the untidy nature of your data frame. I started to provide solutions based on both your untidy data and a tidy equivalent, but the untidy solution, whilst possible, became just too painful.

So, here's a solution based on a tidy equivalent of your data frame.

First, make it tidy. The reason your data frame is untidy is that your column names contain information, namely that A1 and A2 contain the acceptance limits for values in A, and so on. We can correct this by making the data frame longer.

The process is a little long because of the extent of the untidyness of the original. It might be possible to create a more compact version of the transformation using, say, names_pattern and other advanced arguments to pivot_longer(), but the long version at least has the benefit of clarity.

longDF <- df %>% 
  select(Number, Category, A, B, C) %>% 
  pivot_longer(
    c(-Category, -Number),
    names_to="Variable",
    values_to="Value"
  ) %>% 
  left_join(
    df %>% 
      select(Number, Category, A1, B1, C1) %>% 
      pivot_longer(
        c(-Category, -Number),
        names_to="Variable",
        values_to="Lower"
      ) %>% 
    mutate(Variable=str_sub(Variable, 1, 1)),
    by=c("Number", "Category", "Variable")
  ) %>% 
  left_join(
    df %>% 
      select(Number, Category, A2, B2, C2) %>% 
      pivot_longer(
      c(-Category, -Number),
      names_to="Variable",
      values_to="Upper"
    ) %>% 
    mutate(Variable=str_sub(Variable, 1, 1)),
    by=c("Number", "Category", "Variable")
  )
longDF
# A tibble: 6 x 6
  Number Category Variable Value Lower Upper
   <dbl>    <dbl> <chr>    <dbl> <dbl> <dbl>
1      1        1 A            5    10    30
2      1        1 B           10     5    15
3      1        1 C           NA    NA    NA
4      2        2 A           40    10    30
5      2        2 B           20     5    15
6      2        2 C           45    25    35

So at this point, we have columns that define the Category of the test, the Variable being measured, its Value and the two acceptance limits (Lower and Upper).

Now, determining the acceptability of each Value is straightforward.

longDF <- longDF %>% 
            mutate(
              Result=ifelse(
                       Value < Lower, 
                       "Pass", 
                       ifelse(Value < Upper, "Danger", "Fail")
                     )
            )
longDF
# A tibble: 6 x 7
  Number Category Variable Value Lower Upper Result
   <dbl>    <dbl> <chr>    <dbl> <dbl> <dbl> <chr> 
1      1        1 A            5    10    30 Pass  
2      1        1 B           10     5    15 Danger
3      1        1 C           NA    NA    NA NA    
4      2        2 A           40    10    30 Fail  
5      2        2 B           20     5    15 Fail  
6      2        2 C           45    25    35 Fail

Also, note that the categorisation of each value is independent of both the Variable and the number of possible variables. So the code is robust in these respects.

Now we can categorise the results by Number and Category.

longDF %>% 
  group_by(Number, Category, Result) %>% 
  summarise(N=n(), .groups="drop") %>% 
  pivot_wider(
    names_from=Result, 
    values_from=N, 
    values_fill=0
  )
# A tibble: 2 x 7
  Number Category Danger  Pass  `NA`  Fail 
   <dbl>    <dbl>  <int> <int> <int> <int>  
1      1        1      1     1     1     0  
2      2        2      0     0     0     3

Again, we are robust with respect to both the number of Categorys and Numbers, and their labels.

Evaluating the overall results is also straightforward, but slightly long winded because of the various options. Note that your text is inconsistent with the desired output because you haven't explained how an overall result of "warn" for Category = 1 is obtained. I've gone with the text. if you want to match the sample output, the changes to the code should be simple once the criteria are defined.

longDF %>% 
  group_by(Number, Category, Result) %>% 
  summarise(N=n(), .groups="drop") %>% 
  pivot_wider(
    names_from=Result, 
    values_from=N, 
    values_fill=0
  ) %>% 
  mutate(
    Result=ifelse(
             Category == 1, 
             ifelse(Fail == 0, "Pass", ifelse(Fail == 1, "Risk", "Fail")),
             ifelse(Fail < 2, "Pass", ifelse(Fail == 2, "Risk", "Fail"))
            )
  )
# A tibble: 2 x 7
  Number Category Danger  Pass  `NA`  Fail Result
   <dbl>    <dbl>  <int> <int> <int> <int> <chr> 
1      1        1      1     1     1     0 Pass  
2      2        2      0     0     0     3 Fail

If you need to know which Variable caused potential failures, that can also be obtained from longDF with a small change to the grouping.

longDF %>% 
  group_by(Category, Variable, Result) %>% 
  summarise(N=n(), .groups="drop") %>% 
  pivot_wider(
    names_from=Variable, 
    values_from=Result
  )
# A tibble: 2 x 5
  Category     N A     B      C    
     <dbl> <int> <chr> <chr>  <chr>
1        1     1 Pass  Danger NA   
2        2     1 Fail  Fail   Fail

And, of course, you could join these two data frames together to get a comprehensive description of both the overall results and the component variable assessments.

Upvotes: 0

For loop with if else

Answers (3)

Related Questions