Rain_City
Rain_City

Reputation: 21

aggregate function in R (dealing with NA's)

sorry if this question has already been ensered but I couldn't find what i needed...

this is my hypothetical database:

x1=c("A", "A", "B", "C", "C", "B")
x2=c("L1", "L1", "L1", "L1", "L2", "L1")
x3=c("a", "a", "NA", "b", "j","NA" )
x4=c(17, 17, 13.2, NA, 3, 13.2)
x5=c(1,24,5,7,6,8)
db=as.data.frame(cbind(x1, x2, x3, x4, x5))

I tried many different things, but this is basicaly the idea

dbF=aggregate(db$x5,by=list(db$x1, db$x2, db$x3,db$x4),FUN=sum)

the expected output is this:

x1e=c("A", "B", "C", "C")
x2e=c("L1", "L1", "L1", "L2")
x3e=c("a", "NA", "b", "j")                 
x4e=c(17, 13.2, NA, 3)
x5e=c(25,13,7,6)
dbExpected=as.data.frame(cbind(x1e, x2e, x3e, x4e, x5e))

I realy need to keep the NA's in the final output....any suggestions? thx in advance

Upvotes: 1

Views: 92

Answers (2)

maloneypatr
maloneypatr

Reputation: 3622

You can use dplyr and some of your functions are redundant.

# install.packages('dplyr') # only run if not installed
library(dplyr)

x1=c("A", "A", "B", "C", "C", "B")
x2=c("L1", "L1", "L1", "L1", "L2", "L1")
x3=c("a", "a", "NA", "b", "j","NA" )
x4=c(17, 17, 13.2, NA, 3, 13.2)
x5=c(1,24,5,7,6,8)
db=data.frame(x1, x2, x3, x4, x5)

db %>%
  group_by(x1, x2, x3, x4) %>%
  dplyr::summarise(x5e = sum(x5))

Source: local data frame [4 x 5]
Groups: x1, x2, x3 [?]

      x1     x2     x3    x4   x5e
  (fctr) (fctr) (fctr) (dbl) (dbl)
1      A     L1      a  17.0    25
2      B     L1     NA  13.2    13
3      C     L1      b    NA     7
4      C     L2      j   3.0     6

Upvotes: 1

Rorschach
Rorschach

Reputation: 32466

Couple things: when you make your data.frame like that (cbind then coerce) you are making an intermediate matrix of characters, so when you coerce to a data.frame everything is a factor (not wanted for obvious reasons since x5 should be numeric). Also, make sure that the x4 variable has a NA level (here using addNA, so when you aggregate by it, you get what you want.

x1=c("A", "A", "B", "C", "C", "B")
x2=c("L1", "L1", "L1", "L1", "L2", "L1")
x3=c("a", "a", "NA", "b", "j","NA" )
x4=addNA(factor(c(17, 17, 13.2, NA, 3, 13.2)))
x5=c(1,24,5,7,6,8)
db=data.frame(x1, x2, x3, x4, x5)

dbF=aggregate(x5 ~ x1+x2+x3+x4, data=db, FUN=sum, na.action=na.pass)
dbF
#  x1 x2 x3   x4 x5
# 1  C L2  j    3  6
# 2  B L1 NA 13.2 13
# 3  A L1  a   17 25
# 4  C L1  b <NA>  7

Upvotes: 3

Related Questions