Reputation: 43
I have a large data set that I want to split into individual units. Right now, these unit barriers are marked by NA, but how do I split them? Sample set:
df=matrix(c(1,2,3,4,NA,6,7,8,NA,10,11,12),ncol=1,byrow=TRUE)
gives us
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
[5,] NA
[6,] 6
[7,] 7
[8,] 8
[9,] NA
[10,] 10
[11,] 11
[12,] 12
I would like these three stored in separate variables, such that
a
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
b
[,1]
[1,] 6
[2,] 7
[3,] 8
c
[,1]
[1,] 10
[2,] 11
[3,] 12
Does this make sense? Thanks.
Upvotes: 3
Views: 95
Reputation: 121618
One line solution using split
and cumsum
after removing missing values:
split(df[!is.na(df)],cumsum(is.na(df))[!is.na(df)])
$`0`
[1] 1 2 3 4
$`1`
[1] 6 7 8
$`2`
[1] 10 11 12
Upvotes: 2
Reputation: 206606
I wasn't sure if by "data set" you meant a true matrix or a data.frame. Here's a data.frame example, a matrix would be similar
df <- data.frame(a=c(1,2,3,4,NA,6,7,8,NA,10,11,12))
gg <- ifelse(is.na(df$a),NA, cumsum(is.na(df$a)))
split(df, gg)
We just use gg
as a new variable to count up every time we see an NA so we can divide the sections into groups. We also retain the NA values to drop them for the splitting. And finally split()
with this new categorical variable does what we want.
$`0`
a
1 1
2 2
3 3
4 4
$`1`
a
6 6
7 7
8 8
$`2`
a
10 10
11 11
12 12
Upvotes: 1