Reputation:
We can use the following data frame as an example:
Case <- c("Siddhartha", "Siddhartha", "Siddhartha", "Paul", "Paul", "Paul", "Hannah")
Procedure <- c("1", "1", "2", "3", "3", "4", "1")
Location <- c("a", "a", "b", "a", "a", "b", "c")
(df <- data.frame(Case, Procedure, Location))
Case Procedure Location
1 Siddhartha 1 a
2 Siddhartha 1 a
3 Siddhartha 2 b
4 Paul 3 a
5 Paul 3 a
6 Paul 4 b
7 Hannah 1 c
Now i do the following:
df %>%
count(Location, Procedure) %>%
pivot_wider(names_from = Location, values_from = n, values_fill = list(n = 0))
which gives me:
# A tibble: 4 x 4
Procedure a b c
<fct> <int> <int> <int>
1 1 2 0 1
2 3 2 0 0
3 2 0 1 0
4 4 0 1 0
This is not exactly, what i want though. What i want is the following data frame:
# A tibble: 4 x 4
Procedure a b c
<fct> <int> <int> <int>
1 1 1 0 1
2 3 1 0 0
3 2 0 1 0
4 4 0 1 0
Notice the difference in Procedure 1 and 3.
So what i would like is a function, that counts the number of DISTINCT cases for each Procedures AND each location. Also that function should be working on varying data frames, where there are different (unknown) cases and procedures.
Regards
Upvotes: 0
Views: 71
Reputation: 719
The function distinct()
should work, it will remove any duplicates based on the combination of columns you feed it. The argument .keep_all = T
makes it so it doesn't delete the data in the rows not specified in the function. If case doesn't matter you can remove it or set it to F
df %>%
distinct(Procedure, Location, .keep_all = T) %>%
count(Location, Procedure) %>%
pivot_wider(names_from = Location, values_from = n, values_fill = list(n = 0))
Upvotes: 0