Avi
Avi

Reputation: 2283

Removing both row and column of partial NA value

I have the following dataframe (s):

s<-read.table(text = "V1    V2  V3  V4  V5  V6  V7  V8  V9  V10 
  1 0   62  64  44  NA  55  81  66  57  53  
  2 0   0   65  50  NA  56  79  69  52  55  
  3 0   0   0   57  NA  62  84  76  65  59  
  4 0   0   0   0   NA  30  70  61  41  36  
  5 0   0   0   0   NA  NA  NA  NA  NA  NA  
  6 0   0   0   0   0   0   66  63  51  44  
  7 0   0   0   0   0   0   0   80  72  72  
  8 0   0   0   0   0   0   0   0   68  64  
  9 0   0   0   0   0   0   0   0   0   47  
  10    0   0   0   0   0   0   0   0   0   0   ", header = TRUE)

As can be seen row 5 and column 5 in this case includes only NA and 0 values. I would like to omit them and to keep the order of lines and columns. There might be more column and rows in the same pattern and I would like to do the same. The size of the dataframe might be changed. The final result would be:

    V1  V2  V3  V4  V6  V7  V8  V9  V10 
1   0   62  64  44  55  81  66  57  53  
2   0   0   65  50  56  79  69  52  55  
3   0   0   0   57  62  84  76  65  59  
4   0   0   0   0   30  70  61  41  36  
6   0   0   0   0   0   66  63  51  44  
7   0   0   0   0   0   0   80  72  72  
8   0   0   0   0   0   0   0   68  64  
9   0   0   0   0   0   0   0   0   47  
10  0   0   0   0   0   0   0   0   0   

Is there a way to get the omitted row and column number (in this case 5), as well?

Upvotes: 6

Views: 185

Answers (4)

user3652621
user3652621

Reputation: 3634

I was going to suggest:

sclean <- s[rowSums(s == 0|is.na(s)) != ncol(s) | (rowSums(s == 0, na.rm=TRUE) == ncol(s)),
        colSums(s == 0|is.na(s) )!= nrow(s) | colSums(s == 0, na.rm=TRUE) == nrow(s)]

Upvotes: 3

Ananta
Ananta

Reputation: 3711

You have to define more on when exactly you want to drop. In this case it looks like matrix at one side and diagonal always being 0.

However, In general, this is what I use

s[!rowSums(is.na(s))>1,!colSums(is.na(s))>1]

Considering 0's

s[!rowSums(is.na(s)|s==0)>9,!colSums(is.na(s)|s==0)>9]

Upvotes: 4

akrun
akrun

Reputation: 887851

We can try

v1 <- colSums(is.na(s))
v2 <- colSums(s==0, na.rm=TRUE)
j1 <- !(v1>0 & (v1+v2)==nrow(s) & v2 >0)

v3 <- rowSums(is.na(s))
v4 <- rowSums(s==0, na.rm=TRUE)
i1 <- !(v3>0 & (v3+v4)==ncol(s) & v3 >0)
s[i1, j1]
#   V1 V2 V3 V4 V6 V7 V8 V9 V10
#1   0 62 64 44 55 81 66 57  53
#2   0  0 65 50 56 79 69 52  55
#3   0  0  0 57 62 84 76 65  59
#4   0  0  0  0 30 70 61 41  36
#6   0  0  0  0  0 66 63 51  44
#7   0  0  0  0  0  0 80 72  72
#8   0  0  0  0  0  0  0 68  64
#9   0  0  0  0  0  0  0  0  47
#10  0  0  0  0  0  0  0  0   0

Suppose if we change one of the values in 's'

 s$V7[3] <- NA

By running the above code, the output will be

#   V1 V2 V3 V4 V6 V7 V8 V9 V10
#1   0 62 64 44 55 81 66 57  53
#2   0  0 65 50 56 79 69 52  55
#3   0  0  0 57 62 NA 76 65  59
#4   0  0  0  0 30 70 61 41  36
#6   0  0  0  0  0 66 63 51  44
#7   0  0  0  0  0  0 80 72  72
#8   0  0  0  0  0  0  0 68  64
#9   0  0  0  0  0  0  0  0  47
#10  0  0  0  0  0  0  0  0   0

NOTE: The OP's condition is includes only NA and 0 values. I would like to omit them

Upvotes: 3

lmo
lmo

Reputation: 38520

You could try the following:

myRowSums <- rowSums(is.na(s) | s == 0)
myColSums <- colSums(is.na(s) | s == 0)

sSmall <- s[which(myRowSums != ncol(s)), which(myColSums != nrow(s))]

It works for the following dataset to drop all columns and rows that are entirely made up of 0s and NAs.

s <- data.frame(a=c(0, rnorm(5), 0), b=c(0, rnorm(2), NA, NA,1, NA), c=c(rep(c(0,NA), 3), 0))

Upvotes: 1

Related Questions