Reputation: 606
I have a dataframe tmp
tmp <- structure(list(CHROM = c("1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1"), POS = c(1014179L, 1014182L, 1014217L, 1014227L,
1014228L, 1014229L, 1014231L, 1014276L, 1014359L, 1014401L, 1014422L,
1014451L), exist = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0)), row.names = c(NA,
12L), class = "data.frame")
PositionsIneed <- tmp$POS[which(tmp$exist %in% 1)] # <- positions of interests
pos <- c( rep(PositionsIneed, each = 3)-1:3, rep(PositionsIneed, each = 3)+1:3 ) # <- gets all +3/-3 positions
tmp$exist2 <- ifelse(
tmp$POS %in% pos [which(pos %in% tmp$POS)], # <- condition
2, # <- TRUE
0 # <- FALSE
)
tmp
# CHROM POS exist exist2
#1 1 1014179 0 0
#2 1 1014182 0 0
#3 1 1014217 0 0
#4 1 1014227 0 2
#5 1 1014228 1 0
#6 1 1014229 0 2
#7 1 1014231 0 2
#8 1 1014276 0 0
#9 1 1014359 0 0
#10 1 1014401 0 0
#11 1 1014422 1 0
#12 1 1014451 0 0
I would like to create a new column exist3 = 3
only if the values around tmp$exist = 1
and tmp$exist2 = 2
such as to obtain:
# CHROM POS exist exist2 exist3
#1 1 1014179 0 0 0
#2 1 1014182 0 0 0
#3 1 1014217 0 0 0
#4 1 1014227 0 2 0
#5 1 1014228 1 0 3
#6 1 1014229 0 2 0
#7 1 1014231 0 2 0
#8 1 1014276 0 0 0
#9 1 1014359 0 0 0
#10 1 1014401 0 0 0
#11 1 1014422 1 0 0
#12 1 1014451 0 0 0
I saw this thread but this was about two different dataframes. Also, unlike that question, my values are within the same column around +3/-3 around the position of interest.
So, how can I create this new column with the given conditions ?
It would be also better if there was a more straightforward way than creating exist2
.
Thank you in advance.
To make it clearer, I only want to retrieve rows where exist = 1
and only if there are existing values around that position of +3/-3.
For example, the POS
1014228
has 1014227
, 1014229
and 1014231
which falls in the window of +3/-3.
Whereas the POS
1014422
does not have any existing value within the range of +3/-3.
Upvotes: 1
Views: 425
Reputation: 35554
Order the data by POS
at first and identify whether absolute values of POS - lag(POS)
and POS - lead(POS)
are less than or equal to 3.
library(dplyr)
tmp %>%
arrange(POS) %>%
mutate(exist3 = (exist == 1 &
abs(POS - lag(POS)) <= 3 &
abs(POS - lead(POS)) <= 3) * 3)
# CHROM POS exist exist3
# 1 1 1014179 0 0
# 2 1 1014182 0 0
# 3 1 1014217 0 0
# 4 1 1014227 0 0
# 5 1 1014228 1 3
# 6 1 1014229 0 0
# 7 1 1014231 0 0
# 8 1 1014276 0 0
# 9 1 1014359 0 0
# 10 1 1014401 0 0
# 11 1 1014422 1 0
# 12 1 1014451 0 0
Upvotes: 1
Reputation: 77
I am not quite sure what your conditions should be. If you want an index (exist3) that indicates if your POS is either exist1=1 or +/- 3 around them, then this should work:
tmp$exist3 <- apply(tmp, 1, function(x) ifelse(x[3]==1 | x[2]%in%c(sapply(c(tmp$POS[tmp$exist]), function(y) y + seq(-3,3))), 3, 0))
Upvotes: 1