Belinda-Jane Netan
Belinda-Jane Netan

Reputation: 526

R dcast: applying value/conditional exclusions to sub-totals

I have a two-parter question. In creating a dataset similar to what I have, we can start with the airquality dataset:

myaqm <- melt(airquality, id=c("Month", "Day"), na.rm = TRUE)

myaqm_dcast <- dcast(myaqm, Day+variable~Month, 
        value.var = "value", sum, margins=c("Day", "variable", "Month"))

Giving us this:

> head(myaqm_dcast, n=10)
   Day variable     5     6     7     8     9  (all)
1    1    Ozone  41.0   0.0 135.0  39.0  96.0  311.0
2    1  Solar.R 190.0 286.0 269.0  83.0 167.0  995.0
3    1     Wind   7.4   8.6   4.1   6.9   6.9   33.9
4    1     Temp  67.0  78.0  84.0  81.0  91.0  401.0
5    1    (all) 305.4 372.6 492.1 209.9 360.9 1740.9
6    2    Ozone  36.0   0.0  49.0   9.0  78.0  172.0
7    2  Solar.R 118.0 287.0 248.0  24.0 197.0  874.0
8    2     Wind   8.0   9.7   9.2  13.8   5.1   45.8
9    2     Temp  72.0  74.0  85.0  81.0  92.0  404.0
10   2    (all) 234.0 370.7 391.2 127.8 372.1 1495.8

First question: How can you remove Wind from each daily (all) sub-total?

I'm sure we're meant to use dplyr, perhaps wholly instead of dcast, however for the life of me I can't find an example which demonstrates sub-total exclusions.

Second question: Similarly we're also trying to work out how to include a daily average row of all variables, excluding Temp. So the dataset would look something like this:

    Day variable     5     6     7     8     9  (all)
1    1    Ozone  41.0   0.0 135.0  39.0  96.0  311.0
2    1  Solar.R 190.0 286.0 269.0  83.0 167.0  995.0
3    1     Wind   7.4   8.6   4.1   6.9   6.9   33.9
4    1     Temp  67.0  78.0  84.0  81.0  91.0  401.0
5    1  Day Avg  79.5  98.2 136.0  43.0  90.0  446.6
6    1    (all) 298.0 364.0 488.0 203.0 354.0 1707.0

7    2    Ozone  36.0   0.0  49.0   9.0  78.0  172.0
8    2  Solar.R 118.0 287.0 248.0  24.0 197.0  874.0
9    2     Wind   8.0   9.7   9.2  13.8   5.1   45.8
10   2     Temp  72.0  74.0  85.0  81.0  92.0  404.0
11   2  Day Avg  54.0  98.9 102.1  15.6  93.4  363.9
12   2    (all) 226.0 361.0 382.0 114.0 367.0 1450.0

Here, Day Avg is the mean of Ozone, Solar.R and Wind -- does not include Temp.

And (all) is the sum of Ozone, Solar.R and Temp -- does not include Wind.

I think it's another case for dplyr, and it would be a similar solution, but I thought I'd ask both questions.

Edit: It's worth mentioning that my real-world data has much more than four variables each 'day', some of which don't actually appear each day, but nevertheless need to be included in the daily 'Day Avg' and '(all)' calculations, so it would be cleaner to exclude variables by name (ie. Temp or Wind) rather than naming all the variables than need to be included (ie. Ozone and Solar.R).

Upvotes: 1

Views: 281

Answers (1)

eipi10
eipi10

Reputation: 93811

The code below calculates the two column summaries with the data in long format and then adds them to the long data frame, rather than doing the calculations during casting, and then adds the row totals at the end:

library(reshape2)
library(dplyr)

myaqm <- melt(airquality, id=c("Month", "Day"), na.rm = TRUE) %>%   
           group_by(Month, Day)

myaqm_summary = myaqm %>% 
  bind_rows(myaqm %>% 
              summarise(value = sum(value[variable != "Wind"], na.rm=TRUE),
                        variable = "All")) %>%
  bind_rows(myaqm %>% 
              summarise(value = mean(value[variable != "Temp"]),
                        variable = "Day Avg")) %>%
  dcast(Day + variable ~ Month, value.var = "value") %>%
  mutate(All = rowSums(.[, grepl("[5-9]", names(.))], na.rm=TRUE))

Here are the first 12 lines of output:

   Day variable         5      6        7         8         9       All
1    1      All 298.00000 364.00 488.0000 203.00000 354.00000 1707.0000
2    1  Day Avg  79.46667 147.30 136.0333  42.96667  89.96667  495.7333
3    1    Ozone  41.00000     NA 135.0000  39.00000  96.00000  311.0000
4    1  Solar.R 190.00000 286.00 269.0000  83.00000 167.00000  995.0000
5    1     Temp  67.00000  78.00  84.0000  81.00000  91.00000  401.0000
6    1     Wind   7.40000   8.60   4.1000   6.90000   6.90000   33.9000
7    2      All 226.00000 361.00 382.0000 114.00000 367.00000 1450.0000
8    2  Day Avg  54.00000 148.35 102.0667  15.60000  93.36667  413.3833
9    2    Ozone  36.00000     NA  49.0000   9.00000  78.00000  172.0000
10   2  Solar.R 118.00000 287.00 248.0000  24.00000 197.00000  874.0000
11   2     Temp  72.00000  74.00  85.0000  81.00000  92.00000  404.0000
12   2     Wind   8.00000   9.70   9.2000  13.80000   5.10000   45.8000

Upvotes: 1

Related Questions