Reputation: 178
I have binned my data in intervals (of 100000) using 2 different frames: from 0 to 100000 and onwards, and from 50000 to 150000 and onwards. I then joined both dataframes, using one column as identifier for the frames (represented in column "x100kb").
For my purpose, if 2 rows (edit: they don't need to be sequent to each other; since the data is not ordered by "chr" and "x100kb" right now) differ in "x100kb" by 0.5 (preferably comparing whole numbers to their +0.5; eg: 60 to 60.5, 65 to 65.5; etc) but they have the same values in "chr" and "occurrences_norm" and "occurrences_tum"; then they are equal and I want to remove one of them. The only thing coming to mind now are loops, which obviusly is not very productive...
Data example:
chr x100Kb occurrences_norm occurrences_tum fold
19064 chr17 61.5 17 0 14.05333
38799 chr5 526.0 16 0 13.96587
38800 chr5 526.5 16 0 13.96587
39946 chr5 1113.5 16 0 13.96587
2377 chr1 1426.0 15 0 13.87277
21859 chr18 733.5 15 0 13.87277
20538 chr18 24.0 14 0 13.77324
21863 chr18 735.5 14 0 13.77324
37699 chr4 1835.5 14 0 13.77324
39924 chr5 1102.5 14 0 13.77324
21506 chr18 550.5 13 0 13.66633
21862 chr18 735.0 13 0 13.66633
22258 chr19 151.5 13 0 13.66633
38972 chr5 613.0 13 0 13.66633
41707 chr6 194.5 13 0 13.66633
2380 chr1 1427.5 12 0 13.55087
20541 chr18 25.5 12 0 13.55087
21252 chr18 421.0 12 0 13.55087
27384 chr2 2243.0 12 0 13.55087
39990 chr5 1135.5 12 0 13.55087
In the example, the 3rd row would be removed.
Upvotes: 1
Views: 77
Reputation: 23574
I read the question in a different way. I thought we need to compare any two sequent rows. For example, check row 1 & 2, row 2 & 3, and so on. I also thought that the condition is the difference in x100Kb is 0.5, not large than 0.5. I thought running four logical checks, using shift()
, would be one way to achieve the goal.
setDT(df1)[!((abs(x100Kb - shift(x100Kb, type = "lag", fill = -Inf)) == 0.5) &
(chr == shift(chr, type = "lag")) &
(occurrences_norm == shift(occurrences_norm, type = "lag")) &
(occurrences_tum == shift(occurrences_tum, type = "lag")))
]
# chr x100Kb occurrences_norm occurrences_tum fold
# 1: chr17 61.5 17 0 14.05333
# 2: chr5 526.0 16 0 13.96587
# 3: chr5 1113.5 16 0 13.96587
# 4: chr1 1426.0 15 0 13.87277
# 5: chr18 733.5 15 0 13.87277
# 6: chr18 24.0 14 0 13.77324
# 7: chr18 735.5 14 0 13.77324
# 8: chr4 1835.5 14 0 13.77324
# 9: chr5 1102.5 14 0 13.77324
#10: chr18 550.5 13 0 13.66633
#11: chr18 735.0 13 0 13.66633
#12: chr19 151.5 13 0 13.66633
#13: chr5 613.0 13 0 13.66633
#14: chr6 194.5 13 0 13.66633
#15: chr1 1427.5 12 0 13.55087
#16: chr18 25.5 12 0 13.55087
#17: chr18 421.0 12 0 13.55087
#18: chr2 2243.0 12 0 13.55087
#19: chr5 1135.5 12 0 13.55087
Upvotes: 2
Reputation: 887078
We could also the data.table
library(data.table)
setDT(df1)[df1[, .I[abs(x100Kb - shift(x100Kb, fill = -Inf)) > 0.5] ,
by = .(chr, occurrences_norm, occurrences_tum)]$V1]
# chr x100Kb occurrences_norm occurrences_tum fold
# 1: chr17 61.5 17 0 14.05333
# 2: chr5 526.0 16 0 13.96587
# 3: chr5 1113.5 16 0 13.96587
# 4: chr1 1426.0 15 0 13.87277
# 5: chr18 733.5 15 0 13.87277
# 6: chr18 24.0 14 0 13.77324
# 7: chr18 735.5 14 0 13.77324
# 8: chr4 1835.5 14 0 13.77324
# 9: chr5 1102.5 14 0 13.77324
#10: chr18 550.5 13 0 13.66633
#11: chr18 735.0 13 0 13.66633
#12: chr19 151.5 13 0 13.66633
#13: chr5 613.0 13 0 13.66633
#14: chr6 194.5 13 0 13.66633
#15: chr1 1427.5 12 0 13.55087
#16: chr18 25.5 12 0 13.55087
#17: chr18 421.0 12 0 13.55087
#18: chr2 2243.0 12 0 13.55087
#19: chr5 1135.5 12 0 13.55087
Upvotes: 1
Reputation: 3587
Try this using dplyr
package
library(dplyr)
df1 %>% group_by(chr,occurrences_norm,occurrences_tum) %>%
mutate(tmp=diff(c(0,x100Kb))) %>% filter(tmp>0.5) %>% select(-tmp)
# chr x100Kb occurrences_norm occurrences_tum fold
# (fctr) (dbl) (int) (int) (dbl)
# 1 chr17 61.5 17 0 14.05333
# 2 chr5 526.0 16 0 13.96587
# 3 chr5 1113.5 16 0 13.96587
# 4 chr1 1426.0 15 0 13.87277
# 5 chr18 733.5 15 0 13.87277
# 6 chr18 24.0 14 0 13.77324
# 7 chr18 735.5 14 0 13.77324
# 8 chr4 1835.5 14 0 13.77324
# 9 chr5 1102.5 14 0 13.77324
# 10 chr18 550.5 13 0 13.66633
# 11 chr18 735.0 13 0 13.66633
# 12 chr19 151.5 13 0 13.66633
# 13 chr5 613.0 13 0 13.66633
# 14 chr6 194.5 13 0 13.66633
# 15 chr1 1427.5 12 0 13.55087
# 16 chr18 25.5 12 0 13.55087
# 17 chr18 421.0 12 0 13.55087
# 18 chr2 2243.0 12 0 13.55087
# 19 chr5 1135.5 12 0 13.55087
Upvotes: 0