Reputation: 1
I want to create a table in R from a larger data set. The table would be 2x2, however the values within each cell would be the value of a third variable already in the dataset. Specifically, I am looking at the number of prescriptions for a class of medication by year. Thus one side of the the table would be drug-class (either A or B), the other year (2014-2018), and each cell would represent number of prescriptions.
In the dataset, each row contains statistics for a given medication in a given year. Each row is not an individual prescription/patient. There is a column for number of prescriptions. Each. All the summarize functions are giving me counts which is not what I am looking for.
Ultimately I would like to compare the proportion of patients in each medication class by year with a chi-square test.
Upvotes: 0
Views: 784
Reputation: 11076
Make up some data:
set.seed(42)
Diuretic <- sample(c("yes", "no"), 100, replace=TRUE)
Year <- sample(c(2014, 2015, 2016), 100, replace=TRUE)
Beneficiaries <- round(rnorm(100, 35, 5))
dta <- data.frame(Diuretic, Year, Beneficiaries)
Now use xtabs
:
(dta.tbl <- xtabs(Beneficiaries~Diuretic+Year, dta)
# Year
# Diuretic 2014 2015 2016
# no 741 888 295
# yes 448 649 429
Add totals:
addmargins(dta.tbl)
# Year
# Diuretic 2014 2015 2016 Sum
# no 741 888 295 1924
# yes 448 649 429 1526
# Sum 1189 1537 724 3450
Upvotes: 1