Grouping columns with same missing value patterns in R

Question

Let I have such a data frame(df) with missing values(NA)

df:

head1    head2  head3   head4  head5
-----    -----  -----   -----  -----
65       25     12      65     76
78       5      NA      12     NA
NA       NA     12      5      51
76       32     6       94     11
67       32     NA      1      NA

I want to cretae a list(list1) each element consists of data frames with same NA patterns.

For this example:

list1[1] must consist of a data frame(df1) with columns df$head1 and df$head2
list1[2] must consist of a data frame(df2) with columns df$head3 and df$head5
list1[3] must consist of a data frame(df3) with column df$head4

How can I create such a list using R? I will be very glad for any help. Thanks a lot.

@akrun, I realized that your code works fine for data frames where NA's are not common for each column. But does not work for the below data frame.

df1<-data.frame(head1=c(65,78,NA,76,67),
                head2=c(25,5,NA,32,32),
                head3=c(12,12,NA,6,NA),
                head4=c(65,12,5,94,1),
                head5=c(76,NA,51,11,NA)
)



i1 <- which(is.na(df1), arr.ind=TRUE)
l1 <- unique(split(i1[,2], i1[,1]))
i2 <- c(l1, setdiff(seq_along(df1), unlist(l1)))
l2 <- lapply(i2, function(i) df1[i]) 
l2[order(sapply(l2, function(x) colnames(x)[1]))]

The result is:

[[1]]
  head1 head2 head3
1    65    25    12
2    78     5    12
3    NA    NA    NA
4    76    32     6
5    67    32    NA

[[2]]
  head3 head5
1    12    76
2    12    NA
3    NA    51
4     6    11
5    NA    NA

[[3]]
  head4
1    65
2    12
3     5
4    94
5     1

[[4]]
  head5
1    76
2    NA
3    51
4    11
5    NA

akrun · Accepted Answer

We get the row/column index of NA elements with which and specifying the arr.ind=TRUE. We split the "col" by "row", get the unique elements of index, if there are some columns missing i.e. that have no NA values, we can concatenate (c) that to the end of the list. Then, subset the dataset using the index by looping over the list (lapply(i2,..), and we can order the output list ('l2') by the first column name in each list element.

i1 <- which(is.na(df1), arr.ind=TRUE)
l1 <- unique(split(i1[,2], i1[,1]))
i2 <- c(l1, setdiff(seq_along(df1), unlist(l1)))
l2 <- lapply(i2, function(i) df1[i]) 
l2[order(sapply(l2, function(x) colnames(x)[1]))]
#[[1]]
# head1 head2
#1    65    25
#2    78     5
#3    NA    NA
#4    76    32
#5    67    32

#[[2]]
#  head3 head5
#1    12    76
#2    NA    NA
#3    12    51
#4     6    11
#5    NA    NA

#[[3]]
#  head4
#1    65
#2    12
#3     5
#4    94
#5     1

Grouping columns with same missing value patterns in R

Answers (2)

Related Questions