Tabulate based on number of times ID occurs AND another column value R

Question

I'm formatting a table for a study to count how many times a person had a biological sample collected. The table shows how many separate clinic visits a person had a sample type collected.

I want to count how many times an ID appeared for each sample type. For example, if a person with ID = "1234" had the ID appear 4 times with "DNA" in the SAMPLETYPE column, then they had DNA collected at 4 visits. I want the table to look like this:

SAMPLETYPE	1 VISIT	2 VISITS	3 VISITS
DNA	80	47	24
RNA	36	12	6
SERUM	112	89	65

My dataframe is structured like this:

SAMPLETYPE <- c("DNA","DNA","RNA","PLASMA","RNA","RNA","DNA","PLASMA","PLASMA","PLASMA",
"RNA","RNA","RNA")
ID <- c("1","1","1","2","3","2","4","4","4","5","1","1","1")
df <- data.frame(ID,SAMPLETYPE)

df
   ID SAMPLETYPE
1   1        DNA
2   1        DNA
3   1        RNA
4   2     PLASMA
5   3        RNA
6   2        RNA
7   4        DNA
8   4     PLASMA
9   4     PLASMA
10  5     PLASMA

Above, ID = 1 had DNA collected at 2 visits and had RNA collected at 1 visit. The expected output would be:

SAMPLETYPE	1 VISIT	2 VISITS	3 VISITS
DNA	1	1	0
RNA	3	0	1
PLASMA	2	1	0

How would I manipulate this dataframe in R to build a table in the format of the example?

Jon Spring · Accepted Answer

library(tidyverse)
df %>% 
  count(SAMPLETYPE, ID) %>%   # count the combinations
  count(SAMPLETYPE, n) %>%    # count the frequency of combinations
  pivot_wider(names_from = n, names_prefix = "visits_", 
              values_from = nn, values_fill = 0)

I get different output from the most recent example data. For example, it looks like ID 1 had 4 RNA visits, and ID 2 & 3 each had 1.

# A tibble: 3 x 4
  SAMPLETYPE visits_1 visits_2 visits_4
                   
1 DNA               1        1        0
2 PLASMA            2        1        0
3 RNA               2        0        1

Tabulate based on number of times ID occurs AND another column value R

Answers (1)

Related Questions