Reputation: 24545
I have following data and code:
> dput(mydata)
structure(list(P3 = c(99.4, 105.8, 111.9), P5 = c(100.4, 106.9,
113.1), P10 = c(102, 108.6, 114.9), P25 = c(104.8, 111.6, 118.1
), P50 = c(108, 115, 121.8), P75 = c(111.2, 118.6, 125.6), P90 = c(114.3,
121.9, 129.1), P95 = c(116.1, 123.9, 131.3), P97 = c(117.4, 125.3,
132.7), val = c(115.5, 112.7, 117)), .Names = c("P3", "P5", "P10",
"P25", "P50", "P75", "P90", "P95", "P97", "val"), row.names = 7:9, class = "data.frame")
>
> mydata
P3 P5 P10 P25 P50 P75 P90 P95 P97 val
7 99.4 100.4 102.0 104.8 108.0 111.2 114.3 116.1 117.4 115.5
8 105.8 106.9 108.6 111.6 115.0 118.6 121.9 123.9 125.3 112.7
9 111.9 113.1 114.9 118.1 121.8 125.6 129.1 131.3 132.7 117.0
I want to create a new column 'categ' in mydata which will have the 'number' part of the name of first column (checked from left to right) which contains value larger than 'val' of that row.
Hence, I should get 95,50,25 in the new column.
I know of 'findInterval' and 'match' functions that are used for this kind of classification but I am not able to apply them to mydata. Thanks for your help.
Upvotes: 3
Views: 76
Reputation: 21502
To answer the post-question about speed:
bigdat<-mydata
for(j in 1:10) bigdat<- rbind(bigdat,bigdat)
frist<-function(mydata) {
indx <- max.col(mydata[,-10] >mydata$val,'first')
mydata$categ <- as.numeric(sub("[A-Z]+", "", names(mydata)[indx]))
}
sceond <- function(mydata) indx <- apply(mydata[,-10] > mydata$val, 1, function(x) names(which(x))[1])
library(microbenchmark)
microbenchmark(frist(bigdat),sceond(bigdat))
Unit: milliseconds
expr min lq median uq max neval
frist(bigdat) 5.400829 5.688074 7.166702 7.816168 142.6927 100
sceond(bigdat) 22.333659 24.442536 25.422791 26.984677 178.7408 100
EDIT: per akrun's comment, I added the same regex line to the sceond
function, but it dosn't affect the timing:
sceond <- function(mydata) {
indx <- apply(mydata[,-10] > mydata$val, 1, function(x) names(which(x))[1])
mydata$categ <- as.numeric(sub("[A-Z]+", "", names(mydata)[indx]))
}
Unit: milliseconds
expr min lq median uq max neval
frist(bigdat) 5.315901 5.613826 6.940932 7.791208 29.15699 100
sceond(bigdat) 22.359897 24.588688 25.636795 27.868710 359.79325 100
Upvotes: 1
Reputation: 887158
You could try
indx <- max.col(mydata[,-10] >mydata$val,'first')
mydata$categ <- as.numeric(sub("[A-Z]+", "", names(mydata)[indx]))
mydata$categ
#[1] 95 50 25
Or
indx <- apply(mydata[,-10] > mydata$val, 1, function(x) names(which(x))[1])
and then use sub
as before
mydata <- structure(list(P3 = c(99.4, 105.8, 111.9), P5 = c(100.4, 106.9,
113.1), P10 = c(102, 108.6, 114.9), P25 = c(104.8, 111.6, 118.1
), P50 = c(108, 115, 121.8), P75 = c(111.2, 118.6, 125.6), P90 = c(114.3,
121.9, 129.1), P95 = c(116.1, 123.9, 131.3), P97 = c(117.4, 125.3,
132.7), val = c(115.5, 112.7, 117)), .Names = c("P3", "P5", "P10",
"P25", "P50", "P75", "P90", "P95", "P97", "val"), class = "data.frame",
row.names = c("7", "8", "9"))
Upvotes: 3