Reputation: 281
I used the following code to generate data.frame 'df' from my original data, 'pseudo'.
> df<-pseudo %>% group_by(Drug, CLSI_interpretation) %>%
summarise(n = n()) %>%
filter(Drug %in% c('Cefepime', 'Ceftazidime', 'Piperacillin','Piperacillin/tazobactam','Imipenem','Meropenem','Doripenem','Ciprofloxacin','Levofloxacin','Gentamicin','Tobramycin','Amikacin')) %>%
mutate(freq = (n/sum(n)*100))
Plus a very long mapvalues function that creates the 'class' column from 'Drug'.
All good so far; generates a dataset that looks like the following:
Drug CLSI n freq class
Amikacin I 7213 4.25503047 Aminoglycosides
Amikacin R 13995 8.25580915 Aminoglycosides
Amikacin S 148309 87.48916038 Aminoglycosides
Cefepime I 13326 8.87713502 Cephalosporins
Cefepime R 9744 6.49098031 Cephalosporins
Cefepime S 127046 84.63188468 Cephalosporins
Ceftazidime I 10836 5.98558290 Cephalosporins
Ceftazidime R 15276 8.43814732 Cephalosporins
Ceftazidime S 154923 85.57626978 Cephalosporins
Ciprofloxacin I 8949 4.74295103 Fluoroquinolones
Ciprofloxacin R 31563 16.72832309 Fluoroquinolones
I'm struggling with the next steps. I need to group this data by 'class', and for each class total the 'n' of CLSI %in% c('I','R') and generate a new frequency...basically, n(I + R)/n(I+R+S) and n(S)/n(I+R+S) for each class. Having a lot of trouble figuring out the summarise function because I need to summarise one variable (n) based on reference to another (CLSI), and keep grouped by a third (class). Thanks for your help.
Upvotes: 1
Views: 2291
Reputation: 10215
It's always good to show the complete code, including the reading of the data. Looks like pseudo
is your data. The syntax of items in the %>%
pipe is a little bit different from usual R, in that the first parameter is implicitly the pipe content. Or, simply: remove the "pseudo" from your calls.
library(dplyr)
pseudo = read.table("a.csv",header=TRUE)
pseudo <- pseudo %>%
group_by(class, CLSI) %>% summarise(n= n())
Upvotes: 6