Reputation: 6874
I have a dataframe as below. The first column is positions and the last is level. I want to output the range of rows with the same number in adjacent rows. Anything with '2' should be ignored. Can you help
The input:
1 3
10000 3
20000 3
30000 1
40000 2
50000 2
60000 2
70000 3
80000 1
90000 1
The output
1- 2999 3
3000-3999 1
7000-7999 3
8000-9999 1
Upvotes: 0
Views: 219
Reputation: 206187
Here's a method using the chaning functions of dplyr
. Here's the sample data
dd <- structure(list(pos = c(1L, 10000L, 20000L, 30000L, 40000L, 50000L,
60000L, 70000L, 80000L, 90000L), level = c(3L, 3L, 3L, 1L, 2L,
2L, 2L, 3L, 1L, 1L)), .Names = c("pos", "level"), class = "data.frame",
row.names = c(NA, -10L))
dd <- dd[order(dd$pos), ] #make sure its sorted by position
If the difference to your next pos is always 10000, you can do
library(dplyr)
dd %>% arrange(pos) %>%
mutate(run=cumsum(c(0,diff(level))!=0)) %>%
subset(level!=2) %>%
group_by(run) %>%
summarise(level=max(level), start=min(pos), end=max(pos)+9999) %>%
select(-run)
# level start end
# 1 3 1 29999
# 2 1 30000 39999
# 3 3 70000 79999
# 4 1 80000 99999
otherwise
dd %>% arrange(pos) %>%
mutate(run=cumsum(c(0,diff(level))!=0), nextpos=lead(pos)) %>%
subset(level!=2) %>%
group_by(run) %>%
summarise(level=max(level), start=min(pos), end=max(nextpos)-1) %>%
select(-run)
# level start end
# 1 3 1 29999
# 2 1 30000 39999
# 3 3 70000 79999
# 4 1 80000 NA
can calculate the distance to the next group for all but the last group.
Upvotes: 1