eschultz
eschultz

Reputation: 43

Split single column data frame in R at NA

I have a large data set that I want to split into individual units. Right now, these unit barriers are marked by NA, but how do I split them? Sample set:

df=matrix(c(1,2,3,4,NA,6,7,8,NA,10,11,12),ncol=1,byrow=TRUE)

gives us

       [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4
 [5,]   NA
 [6,]    6
 [7,]    7
 [8,]    8
 [9,]   NA
[10,]    10
[11,]    11
[12,]    12

I would like these three stored in separate variables, such that

a
      [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4
b
      [,1]
 [1,]    6
 [2,]    7
 [3,]    8
c
      [,1]
 [1,]    10
 [2,]    11
 [3,]    12

Does this make sense? Thanks.

Upvotes: 3

Views: 95

Answers (2)

agstudy
agstudy

Reputation: 121618

One line solution using split and cumsum after removing missing values:

 split(df[!is.na(df)],cumsum(is.na(df))[!is.na(df)])
$`0`
[1] 1 2 3 4

$`1`
[1] 6 7 8

$`2`
[1] 10 11 12

Upvotes: 2

MrFlick
MrFlick

Reputation: 206606

I wasn't sure if by "data set" you meant a true matrix or a data.frame. Here's a data.frame example, a matrix would be similar

df <- data.frame(a=c(1,2,3,4,NA,6,7,8,NA,10,11,12))
gg <- ifelse(is.na(df$a),NA, cumsum(is.na(df$a)))
split(df, gg)

We just use gg as a new variable to count up every time we see an NA so we can divide the sections into groups. We also retain the NA values to drop them for the splitting. And finally split() with this new categorical variable does what we want.

$`0`
  a
1 1
2 2
3 3
4 4

$`1`
  a
6 6
7 7
8 8

$`2`
    a
10 10
11 11
12 12

Upvotes: 1

Related Questions