Reputation: 23
I have a 96 x 48 dataframe df. The first column is an identifying field (char), columns 2 - 48 are numeric values. I also have two numeric vectors with 96 elements each, consisting of upper and lower bounds that correspond to each row.
I would like to create a new dataframe with an identical column 1, but for columns 2-48 I would like to see if the value is between the values in the two vectors for each row. Then I'd like to have 1 in the new data frame if it is, 0 if it is not (a boolean of sorts).
example:
df:
1 2 3 4 .. 48 a 7 11 15 58 b 6 9 13 46 c 8 14 20 73
vectors:
upper: 24, 35, 22, 63 lower: 10, 11, 12, 11
return:
1 2 3 4 .. 48 a 0 1 1 0 (between upper[1] and lower[1]) b 0 0 1 0 (between upper[2] and lower[2]) c 0 1 1 0 ...
I'd like to do this without a loop since I'm pretty sure there's a way to do this, but I can't seem to find it.
Upvotes: 2
Views: 1373
Reputation: 691
Another option is to just use apply over columns. I think it is pretty simple and clean.
df <- data.frame(V2=c(7,6,8), V3=c(11,9,14), V4=c(15,13,20), V48=c(58,46,73))
upper <- c(24, 35, 22)
lower <- c(10, 11, 12)
data.frame(apply(df,2,function(x)((upper>=x)*(x>=lower))))
V2 V3 V4 V48
1 0 1 1 0
2 0 0 1 0
3 0 1 1 0
EDIT: After MKR comment, I became curious and had to test performance. If there is any suggestion on how to measure it in a different way, please comment.
df <- data.frame(V2=c(7,6,8), V3=c(11,9,14), V4=c(15,13,20), V48=c(58,46,73))
upper <- c(24, 35, 22)
lower <- c(10, 11, 12)
start.time <- Sys.time()
data.frame(apply(df,2,function(x)((upper>=x)*(x>=lower))))
#V2 V3 V4 V48
#1 0 1 1 0
#2 0 0 1 0
#3 0 1 1 0
Sys.time()-start.time
#Time difference of 0.0146079 secs
start.time <- Sys.time()
data.frame(apply(df,2,function(x)(as.numeric((upper>=x)&(x>=lower)))))
#V2 V3 V4 V48
#1 0 1 1 0
#2 0 0 1 0
#3 0 1 1 0
Sys.time()-start.time
#Time difference of 0.0124476 secs
start.time <- Sys.time()
data.frame(ifelse(upper > df[] & lower < df[], 1, 0))
#V2 V3 V4 V48
#1 0 1 1 0
#2 0 0 1 0
#3 0 1 1 0
Sys.time()-start.time
#Time difference of 0.008914948 secs
Upvotes: 1
Reputation: 20095
Another possible simpler solution could be:
df <- data.frame(c1 = c(7, 6, 8),
c2 = c(11, 9, 14),
c3 = c(15, 13, 20),
c48 = c(58, 46, 73))
lower.bounds <- c(10, 11, 12)
upper.bounds <- c(24, 35, 22)
ifelse(upper.bounds > df[] & lower.bounds < df[], 1, 0)
# Result:
# c1 c2 c3 c48
# [1,] 0 1 1 0
# [2,] 0 0 1 0
# [3,] 0 1 1 0
OR
as.data.frame(ifelse(upper.bounds > df[] & lower.bounds < df[], 1, 0))
# Result:
#
# c1 c2 c3 c48
# 1 0 1 1 0
# 2 0 0 1 0
# 3 0 1 1 0
Upvotes: 1
Reputation: 79328
since you said that the other variables are numeric, then we can do:
ifelse(t(upper.bounds-t(df[-1])>0&lower.bounds-t(df[-1])<0),1,0)
c2 c3 c4 c48
[1,] 0 0 1 0
[2,] 0 0 1 0
[3,] 0 1 1 0
There is no need of lapply
or forloop
where the data:
df=read.table(text=" c1 c2 c3 c4 c48
a 7 11 15 58
b 6 9 13 46
c 8 14 20 73
",h=T)
Upvotes: 2
Reputation: 5378
One method using dplyr:
# Data
df <- data.frame(id=letters[1:3], col2=c(7,6,8), col3=c(11,9,14), col4=c(15,13,20), col48=c(58,46,73))
# chain of operations
library(dplyr)
df %>%
mutate(upper = c(24, 35, 22), lower = c(10, 11, 12)) %>%
mutate_at(paste0("col", c(2:4, 48)), funs(.>=lower & .<=upper)) %>%
mutate_at(paste0("col", c(2:4, 48)), as.integer) %>%
select(-lower, -upper)
Output:
col1 col2 col3 col4 col48
1 a 0 1 1 0
2 b 0 0 1 0
3 c 0 1 1 0
Upvotes: 2
Reputation: 8770
You can avoid an explicit for
loop by using an implicit loop via lappy
that loops over all columns. I think that loop is not critical from a performance point-of-view if you loop over the columns but only if you loop over the rows (since R stores the elements of a column as vector in continuous memory locations so that the performance is optimal but the elements of each row are spreaded over the memory locations which causes a performance penalty to loop over rows 1 by 1):
df <- data.frame(c1 = c(7, 6, 8), c2 = c(11, 9, 14), c3 = c(15, 13, 20), c48 = c(58, 46, 73))
df
lower.bounds <- c(10, 11, 12) # , 11)
upper.bounds <- c(24, 35, 22) # , 63)
res <- lapply(df, function(col) {ifelse(col >= lower.bounds & col <= upper.bounds, 1, 0)})
as.data.frame(res)
# c1 c2 c3 c48
# 1 0 1 1 0
# 2 0 0 1 0
# 3 0 1 1 0
Upvotes: 1