Reputation: 6874
I have a dataframe as follows:
chr leftPos ZScore1 ZScore2 ZScore3 ZScore4
1 24352 34 43 19 43
1 53534 2 1 -1 -9
2 34 -15 7 -9 -18
3 3443 -100 -4 4 -9
3 3445 -100 -1 6 -1
3 3667 5 -5 9 5
3 7882 -8 -9 1 3
I would like to only keep those rows which have the same chr and have adjacent columns with the ZScore going in the same direction. In other words a row should remain if the row before or after for that chr has the same sign (positive or negative). I would like this to run for all columns with ZS in the column name so that the output ends up just being a number of rows that fulfil the criteria for each row.
For one column the code should result in:
chr leftPos ZScore
1 24352 34
1 53534 2
3 3443 -100
3 3445 -100
but the final output should look like
ZScore1 ZScore2 ZScore3 ZScore4
nrow 4 6 4 4
I have tried bits of code but Im not even really sure how to approach this.
I guess I would group by chr and then see if the row above was the same positive or negative as the current row and then see if the row below was the same direction as the current row. Then move to the next row for that chr.
Upvotes: 5
Views: 3320
Reputation: 887058
An option using the devel version of data.table
(similar to the approach in @dimitris_ps post). Instructions to install the devel version are here
library(data.table)#v1.9.5
na.omit(setDT(df)[, {tmp= sign(ZScore)
.SD[tmp==shift(tmp) | tmp==shift(tmp, type='lead')] },
by=chr])
# chr leftPos ZScore
#1: 1 24352 34
#2: 1 53534 2
#3: 3 3443 -100
#4: 3 3445 -100
We could create a function
f1 <- function(dat, ZCol){
na.omit(as.data.table(dat)[, {tmp = sign(eval(as.name(ZCol)))
.SD[tmp==shift(tmp) | tmp==shift(tmp, type='lead')]},
by=chr])[, list(.N)]}
nm1 <- paste0('ZScore', 1:4)
setnames(do.call(cbind,lapply(nm1, function(x) f1(df1, x))), nm1)[]
# ZScore1 ZScore2 ZScore3 ZScore4
#1: 4 6 4 4
Or we can use set
res <- as.data.table(matrix(0, ncol=4, nrow=1,
dimnames=list(NULL, nm1)))
for(j in seq_along(nm1)){
set(res, i=NULL, j=j, value=f1(df1,nm1[j]))
}
res
# ZScore1 ZScore2 ZScore3 ZScore4
#1: 4 6 4 4
df <- structure(list(chr = c(1L, 1L, 2L, 3L, 3L, 3L, 3L),
leftPos = c(24352L,
53534L, 34L, 3443L, 3445L, 3667L, 7882L), ZScore = c(34L, 2L,
-15L, -100L, -100L, 5L, -8L)), .Names = c("chr", "leftPos", "ZScore"
), class = "data.frame", row.names = c(NA, -7L))
df1 <- structure(list(chr = c(1L, 1L, 2L, 3L, 3L, 3L, 3L),
leftPos = c(24352L,
53534L, 34L, 3443L, 3445L, 3667L, 7882L), ZScore1 = c(34L, 2L,
-15L, -100L, -100L, 5L, -8L), ZScore2 = c(43L, 1L, 7L, -4L, -1L,
-5L, -9L), ZScore3 = c(19L, -1L, -9L, 4L, 6L, 9L, 1L),
ZScore4 = c(43L,
-9L, -18L, -9L, -1L, 5L, 3L)), .Names = c("chr", "leftPos",
"ZScore1", "ZScore2", "ZScore3", "ZScore4"), class = "data.frame",
row.names = c(NA, -7L))
Upvotes: 3
Reputation: 92282
Here's a possible data.table
solution which uses rleid
from the dev version
setDT(df)[, indx := .N, by = .(chr, rleid(sign(ZScore)))][indx > 1L]
# chr leftPos ZScore indx
# 1: 1 24352 34 2
# 2: 1 53534 2 2
# 3: 3 3443 -100 2
# 4: 3 3445 -100 2
Edit (per new data)
indx <- paste0('ZScore', 1:4)
temp <- setDT(df)[, lapply(.SD, function(x) rleid(sign(x))), .SDcols = indx, by = chr]
Res <- setNames(numeric(length(indx)), indx)
for (i in indx) Res[i] <- length(temp[, .I[.N > 1L], by = c("chr", i)]$V1)
Res
# ZScore1 ZScore2 ZScore3 ZScore4
# 4 6 4 4
Upvotes: 3
Reputation: 4537
This is a loop which does what you want. No fancy packages. Just check if the row behind and the row ahead match - if they do, move on, otherwise, strip the row and check the same position.
chr = c(1,1,2,3,3,3,3,3)
mat = cbind(chr,rnorm(8))
i = 1
while(i <= nrow(mat)){
if (mat[max(i-1,1),1] != mat[i,1] & mat[min(nrow(mat),i+1),1] != mat[i,1]){
mat = mat[-i,]
} else {
i = i+1
}
}
Upvotes: 0
Reputation: 5951
Try this with the package dplyr
library(dplyr)
Data
df <- data.frame(chr=c(1, 1, 2, 3, 3, 3, 3),
leftPos=c(24352, 53534, 34, 3443, 3445, 3667, 7882),
ZScore=c(34, 2, -15, -100, -100, 5, -8))
Code
df %>% group_by(chr) %>%
filter(sign(ZScore)==sign(lag(ZScore)) | sign(ZScore)==sign(lead(ZScore))) %>%
ungroup
Upvotes: 7