Reputation: 18239
I have a data.frame which contain 3 columns named start
, end
and width
. Each line represent a segment over a 1D space with a start, and end and a width such as the "width = end - start + 1"
Here is an example
d = data.frame(
start = c(12, 50, 100, 130, 190),
end = c(16, 80, 102, 142, 201)
)
d$width = d$end - d$start + 1
print(d)
start end width
1 12 16 5
2 50 80 31
3 100 102 3
4 130 142 13
5 190 201 12
Consider two breakpoints and a factor of division
UpperPos = 112
LowerPos = 61
factor = 2
I would like to reduce the width of each segment outside the two breakpoints so that to reduce their width by a factor of factor
. If a segment overlaps a breakpoint, then only the part of the segment that is outside this breakpoint should be reduced in width. In addition, the width of each segment must be a multiple of 3 and must be of non-zero length.
Here is my current function that "squeeze" the segments
squeeze = function(d, factor, LowerPos, UpperPos)
{
for (row in 1:nrow(d))
{
if (d[row,]$end <= LowerPos | d[row,]$end >= UpperPos) # Complete squeeze
{
middlePos = round(d[row,]$start + d[row,]$width/2)
d[row,]$width = round(d[row,]$width / factor)
d[row,]$width = d[row,]$width - d[row,]$width %% 3 + 3
d[row,]$start = round(middlePos - d[row,]$width/2)
d[row,]$end = d[row,]$start + d[row,]$width -1
} else if (d[row,]$start <= LowerPos & d[row,]$end >= LowerPos) # Partial squeeze (Lower)
{
d[row,]$start = round(LowerPos - (LowerPos - d[row,]$start)/factor)
d[row,]$width = d[row,]$end - d[row,]$start + 1
if (d[row,]$width %% 3 != 0)
{
add = 3 - d[row,]$width %% 3
d[row,]$width = d[row,]$width + add
d[row,]$start = d[row,]$start - add
}
} else if (d[row,]$start >= UpperPos & d[row,]$end <= UpperPos) # Partial squeeze (Upper)
{
d[row,]$end = round(UpperPos + (d[row,]$end - UpperPos)/factor)
d[row,]$width = d[row,]$end - d[row,]$start + 1
if (d[row,]$width %% 3 != 0)
{
add = 3 - d[row,]$width %% 3
d[row,]$width = d[row,]$width + add
d[row,]$end = d[row,]$start + add
}
} else if (!(d[row,]$end < UpperPos & d[row,]$start > LowerPos) )
{
print(d)
print(paste("row is ",row))
print(paste("LowerPos is ",LowerPos))
print(paste("UpperPos is ",UpperPos))
stop("In MyRanges_squeeze: Should not run this line!")
}
}
return(d)
}
and it returns the expected output
squeeze(d)
start end width
1 12 14 3
2 54 80 27
3 100 102 3
4 132 140 9
5 192 200 9
However, my function squeeze
is way too slow. Can you help me to improve it?
Upvotes: 0
Views: 250
Reputation: 7455
Note that this answer only addresses how one may speed up your function, which is what you asked in your question, and not the validity of your logic with respect to your requirements.
As far as I can tell, all of your operations use vectorized operators. So, there is no need to loop over rows in squeeze
. In the following, I have encapsulated all of your code that is within the if-else
blocks as separate vectorized functions:
## This computes the case where d$end <= LowerPos | d$end >= UpperPos
f1 <- function(d, factor) {
middlePos = round(d$start + d$width/2)
d$width = round(d$width / factor)
d$width = d$width - d$width %% 3 + 3
d$start = round(middlePos - d$width/2)
d$end = d$start + d$width -1
d
}
## This is used below in f2
f4 <- function(d) {
add = 3 - d$width %% 3
d$width = d$width + add
d$start = d$start - add
d
}
## This computes the case where d$start <= LowerPos & d$end >= LowerPos
f2 <- function(d, factor, LowerPos) {
d$start = round(LowerPos - (LowerPos - d$start)/factor)
d$width = d$end - d$start + 1
ifelse(d$width %% 3 != 0, f4(d), d)
}
## This is used below in f3
f5 <- function(d) {
add = 3 - d$width %% 3
d$width = d$width + add
d$end = d$start + add
d
}
## This computes the case where d$start >= UpperPos & d$end <= UpperPos
f3 <- function(d, factor, UpperPos) {
d$end = round(UpperPos + (d$end - UpperPos)/factor)
d$width = d$end - d$start + 1
ifelse (d$width %% 3 != 0, f5(d), d)
}
Now, in squeeze
, we use f1
, f2
, and f3
to compute the squeeze for all three cases separately. We also include the case for no squeeze as just d
. We then rbind
them to one big data frame, dd
. Now, all we need is to pick the correct row from each block of rows (each of size nrow(d)
) in dd
based on the case for that row. For this, we compute a ind
for the case (i.e., 1
to 4
) using a series of ifelse
's. The value of ind
is the block to chose from, and its position is the row from that block to choose from. We use this to subset dd
to get the output.
squeeze <- function(d, factor, LowerPos, UpperPos) {
d1 <- f1(d, factor)
d2 <- f2(d, factor, LowerPos)
d3 <- f3(d, factor, UpperPos)
dd <- do.call(rbind,list(d1,d2,d3,d))
ind <- ifelse(d$end <= LowerPos | d$end >= UpperPos, 1,
ifelse(d$start <= LowerPos & d$end >= LowerPos, 2,
ifelse(d$start >= UpperPos & d$end <= UpperPos, 3, 4)))
dd[(ind-1) * nrow(d) + 1:nrow(d),]
}
Using this version, the result is the same as yours:
out <- squeeze(d, factor, LowerPos, UpperPos)
## start end width
##1 12 14 3
##7 54 80 27
##18 100 102 3
##4 132 140 9
##5 192 200 9
Upvotes: 1