Reputation: 79
I'm formatting a table for a study to count how many times a person had a biological sample collected. The table shows how many separate clinic visits a person had a sample type collected.
I want to count how many times an ID appeared for each sample type. For example, if a person with ID = "1234" had the ID appear 4 times with "DNA" in the SAMPLETYPE column, then they had DNA collected at 4 visits. I want the table to look like this:
SAMPLETYPE | 1 VISIT | 2 VISITS | 3 VISITS |
---|---|---|---|
DNA | 80 | 47 | 24 |
RNA | 36 | 12 | 6 |
SERUM | 112 | 89 | 65 |
My dataframe is structured like this:
SAMPLETYPE <- c("DNA","DNA","RNA","PLASMA","RNA","RNA","DNA","PLASMA","PLASMA","PLASMA",
"RNA","RNA","RNA")
ID <- c("1","1","1","2","3","2","4","4","4","5","1","1","1")
df <- data.frame(ID,SAMPLETYPE)
df
ID SAMPLETYPE
1 1 DNA
2 1 DNA
3 1 RNA
4 2 PLASMA
5 3 RNA
6 2 RNA
7 4 DNA
8 4 PLASMA
9 4 PLASMA
10 5 PLASMA
Above, ID = 1 had DNA collected at 2 visits and had RNA collected at 1 visit. The expected output would be:
SAMPLETYPE | 1 VISIT | 2 VISITS | 3 VISITS |
---|---|---|---|
DNA | 1 | 1 | 0 |
RNA | 3 | 0 | 1 |
PLASMA | 2 | 1 | 0 |
How would I manipulate this dataframe in R to build a table in the format of the example?
Upvotes: 1
Views: 63
Reputation: 66415
library(tidyverse)
df %>%
count(SAMPLETYPE, ID) %>% # count the combinations
count(SAMPLETYPE, n) %>% # count the frequency of combinations
pivot_wider(names_from = n, names_prefix = "visits_",
values_from = nn, values_fill = 0)
I get different output from the most recent example data. For example, it looks like ID 1 had 4 RNA visits, and ID 2 & 3 each had 1.
# A tibble: 3 x 4
SAMPLETYPE visits_1 visits_2 visits_4
<chr> <int> <int> <int>
1 DNA 1 1 0
2 PLASMA 2 1 0
3 RNA 2 0 1
Upvotes: 1