Ben
Ben

Reputation: 6761

Process NA in a specific way with aggregate function

I have a data frame that looks like this:

Project Week Number
Project1   01  46.0
Project2   01  46.4
Project3   01 105.0
Project1   02  70.0
Project2   02  84.0
Project3   02  34.8
Project1   03  83.0
Project3   03  37.9

Edit:

> dput(my.df)
structure(list(Project = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 
1L, 3L), .Label = c("Project1", "Project2", "Project3"), class = "factor"), 
    Week = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), Number = c(46, 
    46.4, 105, 70, 84, 34.8, 83, 37.9)), .Names = c("Project", 
"Week", "Number"), class = "data.frame", row.names = c(NA, -8L
))

I want to compute the sum for each project for each week.

So I use the aggregate function:

aggregate(Number ~ Project + Week, data = my.df, sum)

As you can see, there is no value for the Project2 in Week 3.

Using the aggregate function just leaves it blank. What I want is to have the line filled in with 0.

I tried:

aggregate(Number ~ Project + Week, data = my.df, sum, na.action = 0)

and

aggregate(Number ~ Project + Week, data = my.df, sum, na.action = function(x) 0)

But none work. Any idea?

Upvotes: 0

Views: 61

Answers (4)

moodymudskipper
moodymudskipper

Reputation: 47350

You can do this in base R, it's pretty much tidyr::complete 's code translated in base R (see @www's answer).

df <- merge(
  setNames(expand.grid(unique(df$Project),unique(df$Week)),c("Project","Week")),
  df, all.x=TRUE)
df$Number[is.na(df$Number)] <- 0

Upvotes: 1

jogo
jogo

Reputation: 12569

You can use xtabs():

my.df <- read.table(header=TRUE, text=
'Project Week Number
Project1   01  46.0
Project2   01  46.4
Project3   01 105.0
Project1   02  70.0
Project2   02  84.0
Project3   02  34.8
Project1   03  83.0
Project3   03  37.9')
my.df$Week <- paste0("0", my.df$Week)

xtabs(Number ~ Project+Week, data=my.df)
#           Week
# Project       01    02    03
#   Project1  46.0  70.0  83.0
#   Project2  46.4  84.0   0.0
#   Project3 105.0  34.8  37.9
as.data.frame(xtabs(Number ~ Project+Week, data=my.df))
#    Project Week  Freq
# 1 Project1   01  46.0
# 2 Project2   01  46.4
# 3 Project3   01 105.0
# 4 Project1   02  70.0
# 5 Project2   02  84.0
# 6 Project3   02  34.8
# 7 Project1   03  83.0
# 8 Project2   03   0.0
# 9 Project3   03  37.9

Upvotes: 3

Bertil Baron
Bertil Baron

Reputation: 5003

Or you can use spread from tidyr with fill = 0

aggregate(Number ~ Project + Week, data = my.df, sum) %>% 
  spread(key = Week,value = Number,fill = 0)

and then use gather to get it back to your original form

aggregate(Number ~ Project + Week, data = my.df, sum) %>% 
  spread(key = Week,value = Number,fill = 0) %>% 
  gather(key = Week, value = Number,`1`,`2`,`3`)

Upvotes: 2

www
www

Reputation: 39174

We can also use the complete function from the tidyr package to fill in the value of Project2 in Week 3. After that, we can aggregate the data.

library(tidyr)

my.df2 <- my.df %>% 
  complete(Project, Week, fill = list(Number = 0))

my.df2

# # A tibble: 9 x 3
#    Project  Week Number
#      <chr> <chr>  <dbl>
# 1 Project1    01   46.0
# 2 Project1    02   70.0
# 3 Project1    03   83.0
# 4 Project2    01   46.4
# 5 Project2    02   84.0
# 6 Project2    03    0.0
# 7 Project3    01  105.0
# 8 Project3    02   34.8
# 9 Project3    03   37.9

DATA

my.df <- read.table(text = "Project Week Number
Project1   '01'  46.0
                 Project2   01  46.4
                 Project3   01 105.0
                 Project1   02  70.0
                 Project2   02  84.0
                 Project3   02  34.8
                 Project1   03  83.0
                 Project3   03  37.9",
                 header = TRUE, stringsAsFactors = FALSE)

my.df$Week <- paste0("0", my.df$Week)

Upvotes: 2

Related Questions