Reputation:
We can use the following data frame as an example:
Cases <- c("Siddhartha", "Siddhartha", "Siddhartha", "Paul", "Paul", "Paul", "Hannah")
Procedures <- c("1", "1", "2", "3", "3", "4", "1")
(df <- data.frame(Cases, Procedures))
Cases Procedures
1 Siddhartha 1
2 Siddhartha 1
3 Siddhartha 2
4 Paul 3
5 Paul 3
6 Paul 4
7 Hannah 1
Now i do the following:
Sum_Group <- function(df, variable){
variable <- enquo(variable)
df %>%
dplyr::group_by(!! variable) %>%
dplyr::summarize(Number = n()) %>%
dplyr::mutate(Prozent = round((Number/sum(Number)*100)))
}
Sum_Group(df, Procedures)
which gives me:
# A tibble: 4 x 3
Procedures Number Prozent
<fct> <int> <dbl>
1 1 3 43
2 2 1 14
3 3 2 29
4 4 1 14
This is not exactly, what i want though. What i want is the following data frame:
Procedures Number Prozent
<fct> <int> <dbl>
1 1 2 40
2 2 1 20
3 3 1 20
4 4 1 20
Notice the difference in Procedure 1 and 3.
So what i would like is a function, that summarizes multiple occurrences of the same procedure for one case as 1 and not as in the first example, as multiple occurrences. Also that function should be working on varying data frames, where there are different (unknown) cases and procedures.
I am not sure, if this is easily done and i'm just overlooking something.
Regards
Upvotes: 1
Views: 59
Reputation: 389235
You want to count the number of distinct cases for each Procedures
. You can use n_distinct
to count that. Also you can use curly-curly operator ({{}}
) which does the job of both enquo
and !!
together.
library(dplyr)
library(rlang)
Sum_Group <- function(df, variable) {
df %>%
group_by({{variable}}) %>%
summarise(Number = n_distinct(Cases)) %>%
mutate(Prozent = round((Number/sum(Number)*100)))
}
Sum_Group(df, Procedures)
# A tibble: 4 x 3
# Procedures Number Prozent
# <chr> <int> <dbl>
#1 1 2 40
#2 2 1 20
#3 3 1 20
#4 4 1 20
Upvotes: 2