Reputation: 235
I would like to have a function to split data frames like this:
q1 q2 q3 q4
1 4 0 33
8 5 33 44
na na na na
na na na na
3 33 2 66
4 2 3 88
6 44 5 99
We will get 2 dataframes:
d1
q1 q2 q3 q4
1 4 0 33
8 5 33 44
and
d2
3 33 2 66
4 2 3 88
6 44 5 99
The obs in d1 and d2 are not fixed. This means that we do not know the obs in the dataframe and how many obs are NAs.
Upvotes: 1
Views: 222
Reputation: 269371
Suppose DF
is the data frame. Since it wasn't specified precisely what the splitting criterion is lets assume that any row with all NA
s is a dividing row. If its some other criterion change the first line appropriately:
isNA <- apply(is.na(DF), 1, all)
split(DF[ !isNA, ], cumsum( isNA )[ !isNA ])
Upvotes: 1
Reputation: 193497
First, read in your data so that "na" gets converted to actual NA
values.
mydf <- read.table(
header = TRUE,
na.strings="na",
text = "q1 q2 q3 q4
1 4 0 33
8 5 33 44
na na na na
3 33 2 66
4 2 3 88
6 44 5 99")
Second, figure out where to split your data.frame
:
# Find the rows where *all* the values are `NA`
RLE <- rle(rowSums(is.na(mydf)) == ncol(mydf))$lengths
# Use that to create "groups" of rows
RLE2 <- rep(seq_along(RLE), RLE)
# Replace even numbered rows with NA -- we don't want them
RLE2[RLE2 %% 2 == 0] <- NA
Third, split your data.frame
split(mydf, RLE2)
# $`1`
# q1 q2 q3 q4
# 1 1 4 0 33
# 2 8 5 33 44
#
# $`3`
# q1 q2 q3 q4
# 4 3 33 2 66
# 5 4 2 3 88
# 6 6 44 5 99
However, this is all somewhat guesswork, because your statement that "This means that we do not know the obs in the dataframe and how many obs are NAs" is not really clear. Here, I've made the assumption that you want to split the data whenever you encounter a full row of NA
values.
Upvotes: 0