Dan
Dan

Reputation: 286

aggregate using "factors" that are NA

I'm struggling to aggregate a data frame into the format I want. The data frame contains a series of parts, along with a list of tests that are performed (Length and Width), and a lower and upper limit (LL and UL) for each measurement. Some of the tests don't have one or the other limit. I'm trying to get a count of how many parts have a given "test-LL-UL" combination, including those tests with NA as one of the limits.

What I've tried so far is the following:

df<-read.table(header = TRUE, text = "
Part Test   LL  UL
A    L      20  40
A    W      5   7
B    L      20  NA
B    W      5   7
C    L      20  40
C    W      10  30
")
aggregate(data=df,Part~Test+LL+UL,FUN=length,na.action=na.pass)

This gives the following output:

  Test  LL  UL Part
1    W   5   7    2
2    W  10  30    1
3    L  20  40    2

What I was expecting to get was:

  Test  LL  UL Part
1    W   5   7    2
2    W  10  30    1
3    L  20  40    2
4    L  20  NA    1

Any help would be greatly appreciated!

Upvotes: 2

Views: 85

Answers (3)

Brandon Loudermilk
Brandon Loudermilk

Reputation: 970

Package {dplyr} can be utilized with functions group_by() and summarize():

df <- data.frame(Part = c("A","A","B","B","C","C"), 
                 Test = c("L","W","L","W","L","W"),
                 LL = c(20,5,20,5,20,10),
                 UL = c(40,7,NA,7,40,30))

grouped <- dplyr::group_by(df, Test, LL, UL)
summarize(grouped, count = n())

##     Test    LL    UL count
##  (fctr) (dbl) (dbl) (int)
##1      L    20    40     2
##2      L    20    NA     1
##3      W     5     7     2
##4      W    10    30     1

Upvotes: 1

David Heckmann
David Heckmann

Reputation: 2939

dplyr handles this quite nicely:

   library(dplyr)
   df %>% group_by(Test,LL,UL)  %>% summarise( n() )

Upvotes: 2

Dan
Dan

Reputation: 286

In line with Jimbou's suggestion, the following works (but feels a little messy):

df<-read.table(header = TRUE, text = "
               Part Test   LL  UL
               A    L      20  40
               A    W      5   7
               B    L      20  NA
               B    W      5   7
               C    L      20  40
               C    W      10  30
               ")
df[is.na(df)] <- "NA"
df<-aggregate(data=df,Part~Test+LL+UL,FUN=length,na.action=na.pass)
df$UL<-as.numeric(df$UL)

I think the appropriate thing to do is to set the Upper Limits to Inf and the Lower Limits to -Inf (this more accurately reflects the meaning of the limits). In this case, the aggregate works as I'd expect.

Upvotes: 0

Related Questions