user13267770
user13267770

Reputation:

group by, summarise distinct cases for two variables

We can use the following data frame as an example:

Case <- c("Siddhartha", "Siddhartha", "Siddhartha", "Paul", "Paul", "Paul", "Hannah")
Procedure <- c("1", "1", "2", "3", "3", "4", "1")
Location <- c("a", "a", "b", "a", "a", "b", "c")

(df <- data.frame(Case, Procedure, Location))

        Case Procedure Location
1 Siddhartha         1        a
2 Siddhartha         1        a
3 Siddhartha         2        b
4       Paul         3        a
5       Paul         3        a
6       Paul         4        b
7     Hannah         1        c

Now i do the following:

df %>% 
  count(Location, Procedure) %>% 
  pivot_wider(names_from = Location, values_from = n, values_fill = list(n = 0))

which gives me:

# A tibble: 4 x 4
  Procedure     a     b     c
  <fct>     <int> <int> <int>
1 1             2     0     1
2 3             2     0     0
3 2             0     1     0
4 4             0     1     0

This is not exactly, what i want though. What i want is the following data frame:

# A tibble: 4 x 4
  Procedure     a     b     c
  <fct>     <int> <int> <int>
1 1             1     0     1
2 3             1     0     0
3 2             0     1     0
4 4             0     1     0

Notice the difference in Procedure 1 and 3.

So what i would like is a function, that counts the number of DISTINCT cases for each Procedures AND each location. Also that function should be working on varying data frames, where there are different (unknown) cases and procedures.

Regards

Upvotes: 0

Views: 71

Answers (1)

NotThatKindODr
NotThatKindODr

Reputation: 719

The function distinct() should work, it will remove any duplicates based on the combination of columns you feed it. The argument .keep_all = T makes it so it doesn't delete the data in the rows not specified in the function. If case doesn't matter you can remove it or set it to F

df %>% 
  distinct(Procedure, Location, .keep_all = T) %>% 
  count(Location, Procedure) %>% 
  pivot_wider(names_from = Location, values_from = n, values_fill = list(n = 0))

Upvotes: 0

Related Questions