Reputation: 353
I have some data where we apply multiple tests (called parameters) to different "die", and each "die" can either pass or fail a given test.
Here is a small portion of a dataframe named alldie
die parameter firstfailure
1 1 Resistance_Test DevID (Ohms) 428 FALSE
2 1 Diode_Test SUBLo (V) 353 FALSE
3 1 Gate_Test V1_WELL (V) 361 FALSE
4 1 Gate_Test V2_WELL (V) 360 FALSE
5 1 Gate_Test V3_WELL (V) 361 FALSE
6 1 Class_Test Cluster Class2 (#) 6 FALSE
7 1 Class_Test Column Class1 (#) 2 TRUE
8 1 Class_Test Cluster Class1 (#) 2 NA
If I provided the full dataset, you'd see multiple die (numbered 1,2,3,...), many more different parameters, and under firstfailure, you would see FALSE (die passed) or TRUE (die failed) and occasionally NA if the test wasn't performed.
I thought I could compute the number of die going through each test (parameter), the number that passed, and the proportion that passed, by writing a function and then using tapply
ly <- function(data) {
ndie <- sum(!is.na(data))
npass <- ndie - sum(data,na.rm = TRUE)
yield <- npass / ndie
c(npass,ndie,yield)
}
This does the calculations I want, but produces some difficult to use output
tapply(alldie$firstfailure, alldie$parameter, ly)) -> lim_yld
then lim_yld looks like (first few rows only, and also tapply
puts the parameters in alphabetical order)
$`Class_Test Cluster Class1 (#) 2`
[1] 76 76 1
$`Class_Test Cluster Class2 (#) 6`
[1] 89 89 1
$`Class_Test Column Class1 (#) 2`
[1] 76.0000000 89.0000000 0.8539326
Questions:
How can I get the data into a dataframe that is more readable? Something like this:
Parameter Npass Ndie Proportion
Class_Test Cluster Class1 (#) 2 76 76 1.0000000
Class_Test Cluster Class2 (#) 6 89 89 1.0000000
Class_Test Column Class1 (#) 2 76 89 0.8539326
How can I sort the parameters in this dataframe in the original order?
Thanks!
Upvotes: 1
Views: 241
Reputation: 24069
How about this a solution. Take the result of the tapply and convert to a dataframe. The add the column headings and parameter names:
df<-as.data.frame(matrix(unlist(lim_yld), ncol=3, byrow=TRUE))
names(df)<-c("npass","ndie","yield")
df<-cbind(parameter=names(lim_yld), df)
As the comments mention above not very generic with respect to the column names, but it does align with your function return. It appears the tapply is returning the list is reverse but just in case this should work:
df<-df[order(df$parameter, alldie$parameter ),]
Upvotes: 1