Reputation: 286
I'm struggling to aggregate a data frame into the format I want. The data frame contains a series of parts, along with a list of tests that are performed (Length and Width), and a lower and upper limit (LL and UL) for each measurement. Some of the tests don't have one or the other limit. I'm trying to get a count of how many parts have a given "test-LL-UL" combination, including those tests with NA as one of the limits.
What I've tried so far is the following:
df<-read.table(header = TRUE, text = "
Part Test LL UL
A L 20 40
A W 5 7
B L 20 NA
B W 5 7
C L 20 40
C W 10 30
")
aggregate(data=df,Part~Test+LL+UL,FUN=length,na.action=na.pass)
This gives the following output:
Test LL UL Part
1 W 5 7 2
2 W 10 30 1
3 L 20 40 2
What I was expecting to get was:
Test LL UL Part
1 W 5 7 2
2 W 10 30 1
3 L 20 40 2
4 L 20 NA 1
Any help would be greatly appreciated!
Upvotes: 2
Views: 85
Reputation: 970
Package {dplyr} can be utilized with functions group_by()
and summarize()
:
df <- data.frame(Part = c("A","A","B","B","C","C"),
Test = c("L","W","L","W","L","W"),
LL = c(20,5,20,5,20,10),
UL = c(40,7,NA,7,40,30))
grouped <- dplyr::group_by(df, Test, LL, UL)
summarize(grouped, count = n())
## Test LL UL count
## (fctr) (dbl) (dbl) (int)
##1 L 20 40 2
##2 L 20 NA 1
##3 W 5 7 2
##4 W 10 30 1
Upvotes: 1
Reputation: 2939
dplyr
handles this quite nicely:
library(dplyr)
df %>% group_by(Test,LL,UL) %>% summarise( n() )
Upvotes: 2
Reputation: 286
In line with Jimbou's suggestion, the following works (but feels a little messy):
df<-read.table(header = TRUE, text = "
Part Test LL UL
A L 20 40
A W 5 7
B L 20 NA
B W 5 7
C L 20 40
C W 10 30
")
df[is.na(df)] <- "NA"
df<-aggregate(data=df,Part~Test+LL+UL,FUN=length,na.action=na.pass)
df$UL<-as.numeric(df$UL)
I think the appropriate thing to do is to set the Upper Limits to Inf and the Lower Limits to -Inf (this more accurately reflects the meaning of the limits). In this case, the aggregate works as I'd expect.
Upvotes: 0