user324810
user324810

Reputation: 606

Create new column if value is within window range in the same column and same dataframe

I have a dataframe tmp

tmp <- structure(list(CHROM = c("1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1"), POS = c(1014179L, 1014182L, 1014217L, 1014227L, 
1014228L, 1014229L, 1014231L, 1014276L, 1014359L, 1014401L, 1014422L, 
1014451L), exist = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0)), row.names = c(NA, 
12L), class = "data.frame")

PositionsIneed <- tmp$POS[which(tmp$exist %in% 1)]                          # <- positions of interests

pos <- c( rep(PositionsIneed, each = 3)-1:3, rep(PositionsIneed, each = 3)+1:3 )        # <- gets all +3/-3 positions

tmp$exist2 <- ifelse(
              tmp$POS %in% pos [which(pos %in% tmp$POS)],                   # <- condition
              2,                                                            # <- TRUE
              0                                                            # <- FALSE
)

tmp
#   CHROM     POS exist exist2
#1      1 1014179     0      0
#2      1 1014182     0      0
#3      1 1014217     0      0
#4      1 1014227     0      2
#5      1 1014228     1      0
#6      1 1014229     0      2
#7      1 1014231     0      2
#8      1 1014276     0      0
#9      1 1014359     0      0
#10     1 1014401     0      0
#11     1 1014422     1      0
#12     1 1014451     0      0

I would like to create a new column exist3 = 3 only if the values around tmp$exist = 1 and tmp$exist2 = 2 such as to obtain:

#   CHROM     POS exist exist2 exist3
#1      1 1014179     0      0      0
#2      1 1014182     0      0      0
#3      1 1014217     0      0      0
#4      1 1014227     0      2      0
#5      1 1014228     1      0      3
#6      1 1014229     0      2      0
#7      1 1014231     0      2      0
#8      1 1014276     0      0      0
#9      1 1014359     0      0      0
#10     1 1014401     0      0      0
#11     1 1014422     1      0      0
#12     1 1014451     0      0      0

I saw this thread but this was about two different dataframes. Also, unlike that question, my values are within the same column around +3/-3 around the position of interest.

So, how can I create this new column with the given conditions ?

It would be also better if there was a more straightforward way than creating exist2.

Thank you in advance.

EDIT:

To make it clearer, I only want to retrieve rows where exist = 1 and only if there are existing values around that position of +3/-3.

For example, the POS 1014228 has 1014227, 1014229 and 1014231 which falls in the window of +3/-3.

Whereas the POS 1014422 does not have any existing value within the range of +3/-3.

Upvotes: 1

Views: 425

Answers (2)

Darren Tsai
Darren Tsai

Reputation: 35554

Order the data by POS at first and identify whether absolute values of POS - lag(POS) and POS - lead(POS) are less than or equal to 3.

library(dplyr)

tmp %>%
  arrange(POS) %>%
  mutate(exist3 = (exist == 1 &
         abs(POS - lag(POS)) <= 3 &
         abs(POS - lead(POS)) <= 3) * 3)

#    CHROM     POS exist exist3
# 1      1 1014179     0      0
# 2      1 1014182     0      0
# 3      1 1014217     0      0
# 4      1 1014227     0      0
# 5      1 1014228     1      3
# 6      1 1014229     0      0
# 7      1 1014231     0      0
# 8      1 1014276     0      0
# 9      1 1014359     0      0
# 10     1 1014401     0      0
# 11     1 1014422     1      0
# 12     1 1014451     0      0

Upvotes: 1

SimeonL
SimeonL

Reputation: 77

I am not quite sure what your conditions should be. If you want an index (exist3) that indicates if your POS is either exist1=1 or +/- 3 around them, then this should work:

tmp$exist3 <- apply(tmp, 1, function(x) ifelse(x[3]==1 | x[2]%in%c(sapply(c(tmp$POS[tmp$exist]), function(y) y + seq(-3,3))), 3, 0))

Upvotes: 1

Related Questions