Reputation: 2283
I have the following dataframe (s
):
s<-read.table(text = "V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 0 62 64 44 NA 55 81 66 57 53
2 0 0 65 50 NA 56 79 69 52 55
3 0 0 0 57 NA 62 84 76 65 59
4 0 0 0 0 NA 30 70 61 41 36
5 0 0 0 0 NA NA NA NA NA NA
6 0 0 0 0 0 0 66 63 51 44
7 0 0 0 0 0 0 0 80 72 72
8 0 0 0 0 0 0 0 0 68 64
9 0 0 0 0 0 0 0 0 0 47
10 0 0 0 0 0 0 0 0 0 0 ", header = TRUE)
As can be seen row 5 and column 5 in this case includes only NA
and 0
values. I would like to omit them and to keep the order of lines and columns. There might be more column and rows in the same pattern and I would like to do the same. The size of the dataframe might be changed.
The final result would be:
V1 V2 V3 V4 V6 V7 V8 V9 V10
1 0 62 64 44 55 81 66 57 53
2 0 0 65 50 56 79 69 52 55
3 0 0 0 57 62 84 76 65 59
4 0 0 0 0 30 70 61 41 36
6 0 0 0 0 0 66 63 51 44
7 0 0 0 0 0 0 80 72 72
8 0 0 0 0 0 0 0 68 64
9 0 0 0 0 0 0 0 0 47
10 0 0 0 0 0 0 0 0 0
Is there a way to get the omitted row and column number (in this case 5), as well?
Upvotes: 6
Views: 185
Reputation: 3634
I was going to suggest:
sclean <- s[rowSums(s == 0|is.na(s)) != ncol(s) | (rowSums(s == 0, na.rm=TRUE) == ncol(s)),
colSums(s == 0|is.na(s) )!= nrow(s) | colSums(s == 0, na.rm=TRUE) == nrow(s)]
Upvotes: 3
Reputation: 3711
You have to define more on when exactly you want to drop. In this case it looks like matrix at one side and diagonal always being 0.
However, In general, this is what I use
s[!rowSums(is.na(s))>1,!colSums(is.na(s))>1]
Considering 0's
s[!rowSums(is.na(s)|s==0)>9,!colSums(is.na(s)|s==0)>9]
Upvotes: 4
Reputation: 887851
We can try
v1 <- colSums(is.na(s))
v2 <- colSums(s==0, na.rm=TRUE)
j1 <- !(v1>0 & (v1+v2)==nrow(s) & v2 >0)
v3 <- rowSums(is.na(s))
v4 <- rowSums(s==0, na.rm=TRUE)
i1 <- !(v3>0 & (v3+v4)==ncol(s) & v3 >0)
s[i1, j1]
# V1 V2 V3 V4 V6 V7 V8 V9 V10
#1 0 62 64 44 55 81 66 57 53
#2 0 0 65 50 56 79 69 52 55
#3 0 0 0 57 62 84 76 65 59
#4 0 0 0 0 30 70 61 41 36
#6 0 0 0 0 0 66 63 51 44
#7 0 0 0 0 0 0 80 72 72
#8 0 0 0 0 0 0 0 68 64
#9 0 0 0 0 0 0 0 0 47
#10 0 0 0 0 0 0 0 0 0
Suppose if we change one of the values in 's'
s$V7[3] <- NA
By running the above code, the output will be
# V1 V2 V3 V4 V6 V7 V8 V9 V10
#1 0 62 64 44 55 81 66 57 53
#2 0 0 65 50 56 79 69 52 55
#3 0 0 0 57 62 NA 76 65 59
#4 0 0 0 0 30 70 61 41 36
#6 0 0 0 0 0 66 63 51 44
#7 0 0 0 0 0 0 80 72 72
#8 0 0 0 0 0 0 0 68 64
#9 0 0 0 0 0 0 0 0 47
#10 0 0 0 0 0 0 0 0 0
NOTE: The OP's condition is includes only NA and 0 values. I would like to omit them
Upvotes: 3
Reputation: 38520
You could try the following:
myRowSums <- rowSums(is.na(s) | s == 0)
myColSums <- colSums(is.na(s) | s == 0)
sSmall <- s[which(myRowSums != ncol(s)), which(myColSums != nrow(s))]
It works for the following dataset to drop all columns and rows that are entirely made up of 0s and NAs.
s <- data.frame(a=c(0, rnorm(5), 0), b=c(0, rnorm(2), NA, NA,1, NA), c=c(rep(c(0,NA), 3), 0))
Upvotes: 1