Reputation: 526
I have a two-parter question. In creating a dataset similar to what I have, we can start with the airquality dataset:
myaqm <- melt(airquality, id=c("Month", "Day"), na.rm = TRUE)
myaqm_dcast <- dcast(myaqm, Day+variable~Month,
value.var = "value", sum, margins=c("Day", "variable", "Month"))
Giving us this:
> head(myaqm_dcast, n=10)
Day variable 5 6 7 8 9 (all)
1 1 Ozone 41.0 0.0 135.0 39.0 96.0 311.0
2 1 Solar.R 190.0 286.0 269.0 83.0 167.0 995.0
3 1 Wind 7.4 8.6 4.1 6.9 6.9 33.9
4 1 Temp 67.0 78.0 84.0 81.0 91.0 401.0
5 1 (all) 305.4 372.6 492.1 209.9 360.9 1740.9
6 2 Ozone 36.0 0.0 49.0 9.0 78.0 172.0
7 2 Solar.R 118.0 287.0 248.0 24.0 197.0 874.0
8 2 Wind 8.0 9.7 9.2 13.8 5.1 45.8
9 2 Temp 72.0 74.0 85.0 81.0 92.0 404.0
10 2 (all) 234.0 370.7 391.2 127.8 372.1 1495.8
First question: How can you remove Wind from each daily (all) sub-total?
I'm sure we're meant to use dplyr
, perhaps wholly instead of dcast
, however for the life of me I can't find an example which demonstrates sub-total exclusions.
Second question: Similarly we're also trying to work out how to include a daily average row of all variables, excluding Temp. So the dataset would look something like this:
Day variable 5 6 7 8 9 (all)
1 1 Ozone 41.0 0.0 135.0 39.0 96.0 311.0
2 1 Solar.R 190.0 286.0 269.0 83.0 167.0 995.0
3 1 Wind 7.4 8.6 4.1 6.9 6.9 33.9
4 1 Temp 67.0 78.0 84.0 81.0 91.0 401.0
5 1 Day Avg 79.5 98.2 136.0 43.0 90.0 446.6
6 1 (all) 298.0 364.0 488.0 203.0 354.0 1707.0
7 2 Ozone 36.0 0.0 49.0 9.0 78.0 172.0
8 2 Solar.R 118.0 287.0 248.0 24.0 197.0 874.0
9 2 Wind 8.0 9.7 9.2 13.8 5.1 45.8
10 2 Temp 72.0 74.0 85.0 81.0 92.0 404.0
11 2 Day Avg 54.0 98.9 102.1 15.6 93.4 363.9
12 2 (all) 226.0 361.0 382.0 114.0 367.0 1450.0
Here, Day Avg is the mean of Ozone, Solar.R and Wind -- does not include Temp.
And (all) is the sum of Ozone, Solar.R and Temp -- does not include Wind.
I think it's another case for dplyr
, and it would be a similar solution, but I thought I'd ask both questions.
Edit: It's worth mentioning that my real-world data has much more than four variables each 'day', some of which don't actually appear each day, but nevertheless need to be included in the daily 'Day Avg' and '(all)' calculations, so it would be cleaner to exclude variables by name (ie. Temp or Wind) rather than naming all the variables than need to be included (ie. Ozone and Solar.R).
Upvotes: 1
Views: 281
Reputation: 93811
The code below calculates the two column summaries with the data in long format and then adds them to the long data frame, rather than doing the calculations during casting, and then adds the row totals at the end:
library(reshape2)
library(dplyr)
myaqm <- melt(airquality, id=c("Month", "Day"), na.rm = TRUE) %>%
group_by(Month, Day)
myaqm_summary = myaqm %>%
bind_rows(myaqm %>%
summarise(value = sum(value[variable != "Wind"], na.rm=TRUE),
variable = "All")) %>%
bind_rows(myaqm %>%
summarise(value = mean(value[variable != "Temp"]),
variable = "Day Avg")) %>%
dcast(Day + variable ~ Month, value.var = "value") %>%
mutate(All = rowSums(.[, grepl("[5-9]", names(.))], na.rm=TRUE))
Here are the first 12 lines of output:
Day variable 5 6 7 8 9 All 1 1 All 298.00000 364.00 488.0000 203.00000 354.00000 1707.0000 2 1 Day Avg 79.46667 147.30 136.0333 42.96667 89.96667 495.7333 3 1 Ozone 41.00000 NA 135.0000 39.00000 96.00000 311.0000 4 1 Solar.R 190.00000 286.00 269.0000 83.00000 167.00000 995.0000 5 1 Temp 67.00000 78.00 84.0000 81.00000 91.00000 401.0000 6 1 Wind 7.40000 8.60 4.1000 6.90000 6.90000 33.9000 7 2 All 226.00000 361.00 382.0000 114.00000 367.00000 1450.0000 8 2 Day Avg 54.00000 148.35 102.0667 15.60000 93.36667 413.3833 9 2 Ozone 36.00000 NA 49.0000 9.00000 78.00000 172.0000 10 2 Solar.R 118.00000 287.00 248.0000 24.00000 197.00000 874.0000 11 2 Temp 72.00000 74.00 85.0000 81.00000 92.00000 404.0000 12 2 Wind 8.00000 9.70 9.2000 13.80000 5.10000 45.8000
Upvotes: 1