Reputation: 87
i have a df which a part of is similar to the following
| Number|Category| A1|A2|B1|B2|C1|C2|A |B |C |
| ------| -------|---|--|--|--|--|--|--|--|--|
| 1 | 1 | 10|30|5 |15|NA|NA|5 |10|NA|
| 2 | 2 | 10|30|5 |15|25|35|40|20|45|
The conditions are
"passed"
,"danger"
,"failed"
.category=1
we are permitted to have only 1
failure in one of the factors and in that case we classify the asset as "risk"
,"fail"
.Category=2
permitted 2 failures. If one factor fails is at "at risk"
, if we have 2 failures is "risk"
and we have 3 failures then its "fail"
.So I would like to calculate for every row(asset) the status of every factor and then the status of the asset. I am trying to that with a for loop and an if-else statement that iterates through all these columns of every row but seems difficult as I am a beginner. The final result is to attach the following columns to the dataset. Thank you in advance
|Number|Aa |Bb |Cc |Result |
|------|------|------|------|-------|
|1 |passed|danger|NA | risk |
|2 |failed|failed|failed| failed|
Upvotes: 2
Views: 173
Reputation: 21938
You can also use the following solution which is a combination of base R and tidyverse
:
library(dplyr)
library(purrr)
colnames <- c(1, 2)
tmp <- df[-colnames]
lapply(split.default(tmp, gsub("(\\w)\\d+?", "\\1", names(tmp))),
function(x) cbind(df[colnames], x)) %>%
imap(~ .x %>%
mutate(!!{.y} := pmap_chr(., ~
ifelse(any(is.na(..3), is.na(..4), is.na(..5)), "NA",
ifelse(..5 > ..3 & ..5 < ..4, "danger", ifelse(..5 < ..3, "passed", "failed"))))) %>%
select(-c(3, 4))) %>%
reduce(~ full_join(..1, ..2, id = c("Number", "Category"))) %>%
rowwise() %>%
mutate(Result = case_when(
Category == 1 & sum(c_across(A:C) == "failed") <= 1 ~ "Risk",
Category == 1 & sum(c_across(A:C) == "failed") > 1 ~ "Fail",
Category == 2 & sum(c_across(A:C) == "failed") == 1 ~ "At_Risk",
Category == 2 & sum(c_across(A:C) == "failed") == 2 ~ "Risk",
Category == 2 & sum(c_across(A:C) == "failed") == 3 ~ "Fail"
))
# A tibble: 2 x 6
# Rowwise:
Number Category A B C Result
<dbl> <dbl> <chr> <chr> <chr> <chr>
1 1 1 passed danger NA Risk
2 2 2 failed failed failed Fail
Upvotes: 1
Reputation: 26238
This can be done in dplyr
only without even reshaping
the data or using any loop (for/while). Using across
, cur_data()
and cur_column()
which are certainly powerful functions from dplyr.
library(dplyr, warn.conflicts = F)
df
#> Number Category A1 A2 B1 B2 C1 C2 A B C
#> 1 1 1 10 30 5 15 NA NA 5 10 NA
#> 2 2 2 10 30 5 15 25 35 40 20 45
df %>% group_by(Number, Category) %>%
transmute(across(c('A', 'B', 'C'), ~ case_when(is.na(.) | is.na(get(paste0(cur_column(), 1))) |
is.na(get(paste0(cur_column(), 2))) ~ NA_character_,
. < get(paste0(cur_column(), 1)) ~ 'passed',
. <= get(paste0(cur_column(), 2)) ~ 'danger',
TRUE ~ 'failed'),
.names = '{.col}{tolower(.col)}')) %>%
mutate(Result = ifelse(rowSums(cur_data() == 'failed', na.rm = T) <= Category, 'risk', 'failed'))
#> # A tibble: 2 x 6
#> # Groups: Number, Category [2]
#> Number Category Aa Bb Cc Result
#> <int> <int> <chr> <chr> <chr> <chr>
#> 1 1 1 passed danger <NA> risk
#> 2 2 2 failed failed failed failed
Created on 2021-07-06 by the reprex package (v2.0.0)
Upvotes: 1
Reputation: 12585
Much of your problem is caused by the untidy nature of your data frame. I started to provide solutions based on both your untidy data and a tidy equivalent, but the untidy solution, whilst possible, became just too painful.
So, here's a solution based on a tidy equivalent of your data frame.
First, make it tidy. The reason your data frame is untidy is that your column names contain information, namely that A1
and A2
contain the acceptance limits for values in A
, and so on. We can correct this by making the data frame longer.
The process is a little long because of the extent of the untidyness of the original. It might be possible to create a more compact version of the transformation using, say, names_pattern
and other advanced arguments to pivot_longer()
, but the long version at least has the benefit of clarity.
longDF <- df %>%
select(Number, Category, A, B, C) %>%
pivot_longer(
c(-Category, -Number),
names_to="Variable",
values_to="Value"
) %>%
left_join(
df %>%
select(Number, Category, A1, B1, C1) %>%
pivot_longer(
c(-Category, -Number),
names_to="Variable",
values_to="Lower"
) %>%
mutate(Variable=str_sub(Variable, 1, 1)),
by=c("Number", "Category", "Variable")
) %>%
left_join(
df %>%
select(Number, Category, A2, B2, C2) %>%
pivot_longer(
c(-Category, -Number),
names_to="Variable",
values_to="Upper"
) %>%
mutate(Variable=str_sub(Variable, 1, 1)),
by=c("Number", "Category", "Variable")
)
longDF
# A tibble: 6 x 6
Number Category Variable Value Lower Upper
<dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 1 1 A 5 10 30
2 1 1 B 10 5 15
3 1 1 C NA NA NA
4 2 2 A 40 10 30
5 2 2 B 20 5 15
6 2 2 C 45 25 35
So at this point, we have columns that define the Category
of the test, the Variable
being measured, its Value
and the two acceptance limits (Lower
and Upper
).
Now, determining the acceptability of each Value
is straightforward.
longDF <- longDF %>%
mutate(
Result=ifelse(
Value < Lower,
"Pass",
ifelse(Value < Upper, "Danger", "Fail")
)
)
longDF
# A tibble: 6 x 7
Number Category Variable Value Lower Upper Result
<dbl> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
1 1 1 A 5 10 30 Pass
2 1 1 B 10 5 15 Danger
3 1 1 C NA NA NA NA
4 2 2 A 40 10 30 Fail
5 2 2 B 20 5 15 Fail
6 2 2 C 45 25 35 Fail
Also, note that the categorisation of each value is independent of both the Variable
and the number of possible variables. So the code is robust in these respects.
Now we can categorise the results by Number
and Category
.
longDF %>%
group_by(Number, Category, Result) %>%
summarise(N=n(), .groups="drop") %>%
pivot_wider(
names_from=Result,
values_from=N,
values_fill=0
)
# A tibble: 2 x 7
Number Category Danger Pass `NA` Fail
<dbl> <dbl> <int> <int> <int> <int>
1 1 1 1 1 1 0
2 2 2 0 0 0 3
Again, we are robust with respect to both the number of Category
s and Number
s, and their labels.
Evaluating the overall results is also straightforward, but slightly long winded because of the various options. Note that your text is inconsistent with the desired output because you haven't explained how an overall result of "warn"
for Category
= 1
is obtained. I've gone with the text. if you want to match the sample output, the changes to the code should be simple once the criteria are defined.
longDF %>%
group_by(Number, Category, Result) %>%
summarise(N=n(), .groups="drop") %>%
pivot_wider(
names_from=Result,
values_from=N,
values_fill=0
) %>%
mutate(
Result=ifelse(
Category == 1,
ifelse(Fail == 0, "Pass", ifelse(Fail == 1, "Risk", "Fail")),
ifelse(Fail < 2, "Pass", ifelse(Fail == 2, "Risk", "Fail"))
)
)
# A tibble: 2 x 7
Number Category Danger Pass `NA` Fail Result
<dbl> <dbl> <int> <int> <int> <int> <chr>
1 1 1 1 1 1 0 Pass
2 2 2 0 0 0 3 Fail
If you need to know which Variable
caused potential failures, that can also be obtained from longDF
with a small change to the grouping.
longDF %>%
group_by(Category, Variable, Result) %>%
summarise(N=n(), .groups="drop") %>%
pivot_wider(
names_from=Variable,
values_from=Result
)
# A tibble: 2 x 5
Category N A B C
<dbl> <int> <chr> <chr> <chr>
1 1 1 Pass Danger NA
2 2 1 Fail Fail Fail
And, of course, you could join these two data frames together to get a comprehensive description of both the overall results and the component variable assessments.
Upvotes: 0