Reputation: 995
I have 100 rows of patient data stored in the object example
. For each patient, we know which one of five possible hospitals at which they were treated, the time period in which they were treated, and how many lymph nodes they had.
set.seed(50)
example <- data.frame(
Hospital = sample(as.factor(c("Hospital 1", "Hospital 2", "Hospital 3", "Hospital 4", "Hospital 5")), size = 100, replace = TRUE),
Time = sample(as.factor(c("2000-2002", "2003-2005", "2006-2008")), size = 100, replace = TRUE),
Nodes = sample(20:100, size = 100, replace = TRUE))
I know that I can view the summary statistics for the number of lymph nodes like so... (Note that I have appended the "n" to the rightward-most column, not sure if there is a more eloquent way to do this.)
cbind(do.call(rbind, by(example$Nodes, example$Hospital, summary)), table(example$Hospital, useNA = "no"))
Min. 1st Qu. Median Mean 3rd Qu. Max.
Hospital 1 20 34.25 54.0 55.55 77.75 90 22
Hospital 2 22 38.75 60.5 56.25 71.75 94 20
Hospital 3 22 37.00 51.0 57.12 81.00 96 17
Hospital 4 25 39.75 55.5 57.11 72.25 97 28
Hospital 5 26 42.00 50.0 57.00 77.00 99 13
Similarly, I can view them for the time period like so:
cbind(do.call(rbind, by(example$Nodes, example$Time, summary)), table(example$Time, useNA = "no"))
Min. 1st Qu. Median Mean 3rd Qu. Max.
2000-2002 20 40.00 57.0 58.84 77 97 37
2003-2005 20 33.75 45.5 52.94 78 99 36
2006-2008 23 39.50 61.0 58.33 72 98 27
I would like to create a 3-way table table in which the leftward, outermost row identifiers are the five hospitals, further sub-stratified by time period. I want the columns to be the summary statistics for the number of lymph nodes. I have a feeling the xtabs() or ftable() might help, but have no idea how to apply them to my problem. In fact, typing ftable(example)
gives me a table that is structured how I would want it to be, but the columns are not what I want. Thanks!
Wow, yes that is almost exactly what I am looking for. My preference, however, would be for it to be in this format (with the numbers filled in, of course):
Nodes
Min. 1st Qu. Median Mean 3rd Qu. Max. n
Hospital Time
Hospital 1 2000-2002
2003-2005
2006-2008
Hospital 2 2000-2002
2003-2005
2006-2008
....and so forth....
Upvotes: 4
Views: 2789
Reputation: 3292
Ordering the dataframe that results from the aggregate()
function that @AnandaMahto mentioned above would provide something very close to what you need, but without the nested values:
dF <- aggregate(Nodes~Hospital+Time, example, summary)
dF <- dF[order(dF[, 1]), ]
Hospital Time Nodes.Min. Nodes.1st Qu. Nodes.Median Nodes.Mean Nodes.3rd Qu.
1 Hospital 1 2000-2002 20.00 25.00 34.00 33.29 38.00
6 Hospital 1 2003-2005 20.00 41.50 77.00 62.86 85.50
11 Hospital 1 2006-2008 35.00 60.50 70.50 68.62 80.75
2 Hospital 2 2000-2002 24.00 40.75 65.50 60.70 80.75
7 Hospital 2 2003-2005 22.00 22.00 26.00 33.75 37.75
12 Hospital 2 2006-2008 45.00 60.25 61.50 63.83 68.00
3 Hospital 3 2000-2002 40.00 63.00 74.00 72.80 91.00
8 Hospital 3 2003-2005 22.00 36.75 66.00 60.50 81.75
13 Hospital 3 2006-2008 23.00 29.50 37.00 40.67 46.75
4 Hospital 4 2000-2002 30.00 55.75 64.50 68.17 90.00
9 Hospital 4 2003-2005 25.00 38.25 42.00 49.36 59.50
14 Hospital 4 2006-2008 27.00 36.00 45.00 45.00 54.00
5 Hospital 5 2000-2002 26.00 39.00 52.00 51.67 64.50
10 Hospital 5 2003-2005 34.00 42.00 50.00 55.40 52.00
15 Hospital 5 2006-2008 30.00 42.00 48.00 61.80 91.00
Nodes.Max.
1 53.00
6 89.00
11 90.00
2 94.00
7 61.00
12 85.00
3 96.00
8 95.00
13 70.00
4 97.00
9 89.00
14 63.00
5 77.00
10 99.00
15 98.00
Upvotes: 2