digit
digit

Reputation: 1523

How can I remove all cells with "NA" value by columns

This question is not a duplicate because my data.frame does not have the same amount of NA values in all Columns and therefore the solution mentioned in that question does not work.

I have a data.frame with a lot of NA values and I would like to delete all cells (important: not rows or columns, cells) that have NA values. The original would look like this:

A  B
1  NA
NA 2
2  NA
NA NA
NA NA
NA 4
3  5

The desired result would look like this:

A  B
1  2
2  4 
3  5

The number of columns would have to stay the same, but it does not matter if the values remain on the same rows. They can just be moved up.

I could image one could delete all cells with the condition NA (maybe with apply) and get the result. Or maybe a simple sorting ?

Thanks.

Update:

A   B   C
1       3
    2   
4       3

    1   2

3       5
        4
    9   
7       1

Upvotes: 4

Views: 6352

Answers (2)

Uwe
Uwe

Reputation: 42592

The OP has requested to remove the NAs columnwise but has pointed out that there might be different numbers of NA in each column.

This can be solved using data.table in two steps:

library(data.table)
# step 1: coerce to data.table in place, move NAs to the bottom of each column, 
# maintain the original order of non-NA values
result <- data.table(DF)[, lapply(.SD, function(x) x[order(is.na(x))])]
     A  B  C
 1:  1  2  3
 2:  4  1  3
 3:  3  9  2
 4:  7 NA  5
 5: NA NA  4
 6: NA NA  1
 7: NA NA NA
 8: NA NA NA
 9: NA NA NA
10: NA NA NA
# step 2: trim result
# either using Reduce
result[!result[, Reduce(`&`, lapply(.SD, is.na))]]

# or using zoo::na.trim()
zoo::na.trim(result, is.na = "all")
    A  B C
1:  1  2 3
2:  4  1 3
3:  3  9 2
4:  7 NA 5
5: NA NA 4
6: NA NA 1

So, there will be some NAs at the end of each colummn unavoidably because all columns in a data.frame have the same length.

Or, alternatively, only complete rows can be kept using the is.na parameter to na.trim():

zoo::na.trim(result, is.na = "any")
   A B C
1: 1 2 3
2: 4 1 3
3: 3 9 2

An alternative solution

As mentioned before, data.frames and cbind() expect all column vectors to have the same length. Here is an alternative solution without data.table which uses the cbind.fill() function from the rowr package which pads vectors with the fill value until the same length:

setNames(do.call(function(...) rowr::cbind.fill(..., fill = NA), lapply(DF, na.omit)),
         colnames(DF))
   A  B C
1  1  2 3
2  4  1 3
3  3  9 2
4  7 NA 5
5 NA NA 4
6 NA NA 1

Data

As supplied by the OP in the update:

DF <- structure(list(A = c(1L, NA, 4L, NA, NA, NA, 3L, NA, NA, 7L), 
    B = c(NA, 2L, NA, NA, 1L, NA, NA, NA, 9L, NA), C = c(3L, 
    NA, 3L, NA, 2L, NA, 5L, 4L, NA, 1L)), .Names = c("A", "B", 
"C"), row.names = c(NA, -10L), class = "data.frame")

Upvotes: 5

BENY
BENY

Reputation: 323386

You can try ,Base on @UweBlock's Data and your updated question.

dat=as.data.frame(na.omit(apply(dat,2,function (x) x[order(is.na(x))])))

dat
  A B C
1 1 2 3
2 4 1 3
3 3 9 2

Upvotes: 2

Related Questions