Reputation: 411
I want to split a data frame like this
chr.pos nt.pos CNV
1 74355 0
1 431565 0
1 675207 0
1 783605 1
1 888149 1
1 991311 1
1 1089305 1
1 1177669 1
1 1279886 0
1 1406311 0
1 1491385 0
1 1579761 0
2 1670488 1
2 1758800 1
2 1834256 0
2 1902924 1
2 1978088 1
2 2063124 0
The point is to get a list of intervals where the chr are the same and CNV=1 column, but taking into account the 0 inervals between them
[[1]]
1 783605 1
1 888149 1
1 991311 1
1 1089305 1
1 1177669 1
[[2]]
2 1670488 1
2 1758800 1
[[3]]
2 1902924 1
2 1978088 1
Any ideas?
Upvotes: 1
Views: 607
Reputation: 115485
You can use rle
to create a variable to use in split
# create a group identifier
DF$GRP <- with(rle(DF$CNV), rep(seq_along(lengths),lengths))
# split a subset of DF which contains only CNV==1
split(DF[DF$CNV==1,],DF[DF$CNV==1,'GRP'] )
$`2`
chr.pos nt.pos CNV GRP
4 1 783605 1 2
5 1 888149 1 2
6 1 991311 1 2
7 1 1089305 1 2
8 1 1177669 1 2
$`4`
chr.pos nt.pos CNV GRP
13 2 1670488 1 4
14 2 1758800 1 4
$`6`
chr.pos nt.pos CNV GRP
16 2 1902924 1 6
17 2 1978088 1 6
Upvotes: 5